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DIFFERENTIALLY REGULATED HEPATOCELLULAR CARCINOMA 
GENES AND USES THEREOF 

CROSS-REFERENCE TO RELATED APPLICATIONS 
This application claims the benefit of U.S. provisional application number 
60/475,508, filed June 4, 2003, which is hereby incorporated by reference. 

FIELD OF THE INVENTION 
The invention is in the field of diagnostics and therapeutics for cancer. More 
specifically, the invention is in the field of diagnostics and therapeutics for 
hepatocellular carcinoma. 



BACKGROUND OF THE INVENTION 
Hepatocellular carcinoma (HCC) is the most common primary malignant 
tumor of the liver that accounts for more than 70% of liver cancers worldwide (Parkin 
et al., 1999). Many risk factors have been associated with the development of HCC, 
including hepatitis B (HBV) and hepatitis C (HCV) viral infection, cirrhosis, male ' 
gender, exposure to toxins, etc. Death generally occurs due to liver failure associated 
with cirrhosis and/or rapid outgrowth of multiple nodules. Approximately 0.25-1 
million new cases of HCC are diagnosed each year, and the cancer is especially 
prevalent in Southeast Asia, China, and sub-Saharan Africa. While surgical resection 
is considered to be the main curative treatment, only 10-15% of cases are suitable for 
surgery at the time of presentation. This is because either the disease is detected at 'an 
advanced stage at presentation or the underlying poor liver functional reserve 
precluded surgical intervention. 

Diagnosis of HCC has included detection of the presence of a liver mass on 
radiological investigations and the detection of elevated serum alpha fetoprotein 
(AFP) levels (Yu and Keeffe, 2003). However, elevation of AFP is not exclusive to 
HCC and has been observed in benign hepatic disease, such as liver cirrhosis, and 
other cancers such as germ cell cancer (Bosl and Head, 1994). Treatment of HCC has 
included interferon therapy and antiviral drugs, but the results have proved 
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unpredictable and the effectiveness maybe limited (Lee, 1997; Yu and Keeffe, 2003). 
Microarrays have been used to address changes in gene expression of HCC (Chen et 
al, 2002, Okabe et al, 2001; Honda et al, 2001; Shirota et al, 2001; Tackels-Horne et 
al, 2001; Xu et al, 2001a; Xu et al, 2001b). However, these reports were restricted to 
the tissue samples selected for each study and exhibited wide variation in the results, 
thus limiting the potential significance and utility of the data. 

SUMMARY OF THE INVENTION 

The invention provides in part molecular markers for hepatocellular carcinoma 
(HCC) that may be used for HCC diagnosis, to assess HCC progression or regression, 
or the efficacy and/or toxicity of HCC therapeutics, and/or to identify candidate 
compounds for HCC therapy, with high predictive accuracy. 

In one aspect, the invention provides a composition including an addressable 
collection of two or more nucleic acid molecules, or polypeptides encoded by these 
nucleic acid molecules, that are differentially expressed in hepatocellular carcinoma, 
where the nucleic acid molecules consist . essentially of the nucleic acid molecules set 
forth in any one or more of Tables 1 through 4 or complements, fragments, variants, 
or analogs thereof. The composition may include all of the nucleic acid molecules, 
or their encoded polypeptides, set forth in any one or more of Tables 1 through 4 or 
complements, fragments, variants, or analogs thereof, or any subset of these nucleic 
acid molecules or polypeptides. The nucleic acid molecules or polypeptides may be 
differentially expressed between hepatocellular carcinoma tissue and non-tumor 
tissue. The nucleic acid molecules or the polypeptides may be attached to a solid 
support. The compositions may be used in the preparation of a medicament for 
diagnosis or therapy of hepatocellular carcinoma. 

In other aspects, the invention provides a method of diagnosing hepatocellular 
carcinoma in a subject by obtaining a sample from the subject and detecting the level 
of expression of two or more nucleic acid molecules or expression products thereof in 
the sample, where the nucleic acid molecules consist essentially of the nucleic acid 
molecules set forth in any one or more of Tables 1 through 4 or complements, 
fragments, variants, or analogs thereof 
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In other aspects, the invention provides a method of monitoring the progression of 
hepatocellular carcinoma in a subject by obtaining a sample from the subject and 
detecting the level of expression of two or more nucleic acid molecules or expression 
products thereof in the sample, where the nucleic acid molecules consist essentially of 
the nucleic acid molecules set forth in any one or more of Tables 1 through 4, or 
complements, fragments, variants, or analogs thereof. The sample may be obtained at 
two or more time points. The method may further include comparing the level of 
expression of the nucleic acid molecules or expression products at two or more time 
points. 

In other aspects the invention provides a method of monitoring the efficacy of 
a hepatocellular carcinoma therapy in a subject by administering the tiierapy to the 
subject, obtaining a sample from the subject, and detecting the level of expression of 
two or more nucleic acid molecules or expression products thereof in the sample, 
where the nucleic acid molecules consist essentially of the nucleic acid molecules set 
forth in any one or more of Tables 1 through 4, or complements, fragments, variants, 
or analogs thereof. The therapy maybe administered at two or more administration 
time points. The sample may be obtained at two or more sampling time points. The 
method may further include comparing the level of expression of the nucleic acid 
molecules or expression products at two or more administration time points, and/or at 
two or more sampling time points. 

In other aspects, the invention provides a method of screening a compound for 
treating hepatocellular carcinoma by contacting a sample with a test compound and 
detecting the level of expression of two or more nucleic acid molecules or expression 
products thereof in the sample, where the nucleic acid molecules consist essentially of 
the nucleic acid molecules set forth in any one or more of Tables 1 through 4, or 
complements, fragments, variants or analogs thereof. 

In alternate embodiments of the various aspects, the sample may be liver or 
serum, or may be suspected of being cancerous, or may be non-cancerous. The 
methods may further include comparing the level of expression of the nucleic acid 
molecules or expression products thereof in a non-cancerous sample and in a sample 
suspected of being cancerous. Differential expression of the nucleic acid molecules or 
expression products thereof may be indicative of hepatocellular carcinoma, or of 
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progression of hepatocellular carcinoma, or of the efficacy of the hepatocellular 
carcinoma therapy. The subject may be suspected of having hepatocellular 
carcinoma. The subject may be a human. 

In alternate embodiments of the various aspects, the method may further 
5 include comparing the level of expression of two or more nucleic acid molecules or 
expression products thereof with a standard, or further include preparing a gene 
expression profile. The method may be a high throughput method. 

In other aspects, the invention provides a solid support including two or more 
nucleic acid molecules or polypeptides encoded by these nucleic acid molecules that 

1 0 are differentially expressed in hepatocellular carcinoma, where the nucleic acid 
molecules consist essentially of the nucleic acid molecules set forth in Tables 1 
through 4 or complements, fragments, variants, or analogs thereof. The nucleic acid 
molecules may consist essentially of all the nucleic acid molecules set forth in any 
one or more of Tables 1 through 4, and/or be differentially expressed between 

1 5 hepatocellular carcinoma tissue and non-tumor tissue. The polypeptides may consist 
essentially of the polypeptides encoded by all the nucleic acid molecules set forth in 
any one or more of Tables 1 through 4, and/or be differentially expressed between 
hepatocellular carcinoma tissue and non-tumor tissue. The nucleic acid molecules or 
the polypeptides may be covalently or non-covalently attached to the solid support 

20 (e.g., a microarray). 

In other aspects, the invention provides a database including information 
identifying the expression level in liver tissue (e.g., cancerous or non-cancerous 
tissue) of two or more nucleic acid molecules or expression products thereof, where 
the nucleic acid molecules consist essentially of the nucleic acid molecules set forth in 

25 any one or more of Tables 1 through 4, or complements, fragments, variants, or 
analogs thereof. 

A "composition" as used herein includes a plurality of the nucleic acid 
molecules described herein, including complements, analogs, variants, and fragments 
thereof. A composition as used herein also includes a plurality of polypeptides 
30 encoded by the nucleic acid molecules described herein, and complements, analogs, 
variants, and fragments thereof. A composition as used herein also includes a plurality 
of polypeptides capable of specifically binding to the polypeptides or nucleic acid 
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molecules described herein (e.g., antibodies). The composition may include any 
combination of the nucleic acid molecules described herein, including complements, 
analogs, variants, and fragments thereof, or polypeptides encoded by these nucleic 
acid molecules. Accordingly, the composition may include 2, 3, 4, 5, 6, 7, 8, 9, 10, 
etc. up to all of the nucleic acid molecules or polypeptides described herein, e.g., in 
any one or more of the Tables or Figures herein. In some embodiments, the 
composition may include subsets of the nucleic acid molecules or polypeptides 
described herein, e.g., in any one or more of the Tables or Figures herein, for 
example, subsets groups by protein function or characteristics, e.g., proteins involved 
in the ubiquitination pathway, or proteins localized to a particular cellular 
compartment These nucleic acid molecules or polypeptides may for example be used 
with a substrate (e.g, a solid substrate or a liquid substrate) in a variety of 
applications, including the diagnosis of HCC, or monitoring the progression of HCC. 

By "addressable collection" is meant a combination of nucleic acid molecules 
or polypeptides capable of being detected by, for example, the use of hybridization 
techniques or antibody binding techniques or by any other means of detection known 
to those of ordinary skill in the art. 

The terms "nucleic acid" or "nucleic acid molecule" encompass both RNA 
(plus and minus strands) and DNA, including cDNA, genomic DNA, and synthetic 
(e.g., chemically synthesized) DNA. The nucleic acid may be double-stranded or 
single-stranded. Where single-stranded, the nucleic acid may be the sense strand or 
the antisense strand. A nucleic acid molecule may be any chain of two or more 
covalently bonded nucleotides, including naturally occurring or non-naturally 
occurring nucleotides, or nucleotide analogs or derivatives. By "RNA" is meant a 
sequence of two or more covalently bonded, naturally occurring or modified 
ribonucleotides. One example of a modified RNA included within this term is 
phosphorothioate RNA. By "DNA" is meant a sequence of two or more covalently 
bonded, naturally occurring or modified deoxyribonucleotides. By "cDNA" is meant 
complementary or copy DNA produced from an RNA template by the action of RNA- 
dependent DNA polymerase (reverse transcriptase). Thus a "cDNA clone" means a 
duplex DNA sequence complementary to an RNA molecule of interest, carried in a 
cloning vector. An "oligonucleotide" as used herein is a single stranded molecule 
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which may he used in hybridization or amplification technologies. In general, an 
oligonucleotide may be any integer from about 1 5 to about 100 nucleotides in length, 
but may also be of greater length. A "probe" or "primer" is a single-stranded DNA 
or RNA molecule of defined sequence that can base pair to a second DNA or RNA 
molecule that contains a complementary sequence (the target). The stability of the 
resulting hybrid molecule depends upon the extent of the base pairing that occurs, and 
is affected by parameters such as the degree of complementarity between the probe 
and target molecule, and the degree of stringency of the hybridization conditions. The 
degree of hybridization stringency is affected by parameters such as the temperature, 
salt concentration, and concentration of organic molecules, such as formamide, and is 
determined by methods that are known to those skilled in the art. Probes or primers 
specific for the nucleic acid sequences described herein, or portions thereof, may vary 
in length by any integer from at least 8 nucleotides to over 500 nucleotides, including 
any value in between, depending on the purpose for which, and conditions under 
which, the probe or primer is used. For example, a probe or primer may be 8, 10, 15, 
20, or 25 nucleotides in length, or may be at least 30, 40, 50, or 60 nucleotides in 
length, or may be over 100, 200, 500, or 1 000 nucleotides in length. Probes or 
primers specific for the nucleic acid molecules described herein may have greater than 
any integer between 20-30% sequence identity, or at least any integer between 55- 
75% sequence identity, or at least any integer between 75-85% sequence identity, or 
at least any integer between 85-99% sequence identity, or 100% sequence identity to 
the nucleic acid sequences described herein. Probes or primers can be detectably- 
labeled, either radioactively or non-radioactively, by methods that are known to those 
skilled in the art Probes or primers can be used for methods involving nucleic acid 
hybridization, such as nucleic acid sequencing, nucleic acid amplification by the 
polymerase chain reaction, single stranded conformational polymorphism (SSCP) 
analysis, restriction fragment polymorphism (RFLP) analysis, Southern hybridization, 
northern hybridization, in situ hybridization, electrophoretic mobility shift assay 
(EMSA), microarray, and other methods that are known to those skilled in the art 
Probes or primers may be derived from genomic DNA or cDNA, for example, by 
amplification, or from cloned DNA segments, or maybe chemically synthesized. 
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The "expression product" of a nucleic acid molecule may be any polypeptide 
encoded by that nucleic acid molecule. Generally, the polypeptide is capable of being 
expressed. 

A "protein," "peptide" or "polypeptide" is any chain of two or more amino 
acids, including naturally occurring or non-naturally occurring amino acids or amino 
acid analogues, regardless of post-translational modification (e.g., glycosylation or 
phosphorylation). An "amino acid sequence", "polypeptide", "peptide" or "protein" of 
the invention may include peptides or proteins that have abnormal linkages, cross 
links and end caps, non-peptidyl bonds or alternative modifying groups. Such 
modified peptides are also within the scope of the invention. The term "modifying 
group" is intended to include structures that are directly attached to the peptidic 
structure (e.g., by covalent coupling), as well as those that are indirectly attached to 
the peptidic structure (e.g., by a stable non-covalent association or by covalent 
coupling to additional amino acid residues, or mimetics, analogues or derivatives 
thereof, which may flank the core peptidic structure). For example, the modifying 
group can be coupled to the ammo-terminus or carboxy-terminus of a peptidic 
structure, or to a peptidic or peptidomimetic region flanking the core domain. 
Alternatively, the modifying group can be coupled to a side chain of at least one 
amino acid residue of a peptidic structure, or to a peptidic or peptide- mimetic region 
flanking the core domain (e.g., through the epsilon amino group of a lysyi residues), 
through the carboxyl group of an aspartic acid residues) or a glutamic acid residue(s), 
through a hydroxy group of a tyrosyl residue(s), a serine residues) or a threonine 
residue(s) or other suitable reactive group on an amino acid side chain). Modifying 
groups covalently coupled to the peptidic structure can be attached by means and 
using methods well known in the art for linking chemical structures, including, for 
example, amide, alkylamino, carbamate or urea bonds. Peptides according to the 
invention may include peptides encoded by the nucleic acid molecules of Tables 1 
through 4 or complements or analogs thereof. 

By "differential expression" or "differentially expressed" is meant increased, 
upregulated or present, or decreased, downregulated or absent, gene expression as 
detected by the absence, presence, or change (up or down) in the amount of 
transcribed messenger RNA or translated protein in a sample. For example, the 
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change may be detected by comparison of the change in gene expression level 
between a HCC sample and a non-tumor sample. The absolute amount of change of 
gene expression is not important, as long as the amount of change is reproducible, and 
measurable. In some embodiments, the change (up or down) in the amount of 
transcribed messenger or translated protein may be at least 1-fold or at least 1 .5-fold 
or may be over 2.0, 2.5., 3.0, 3.5, 4.0, 4.5, or 5.0-fold. In some embodiments, the 
change in the amount of transcribed messenger or translated protein may be 40%, 
50%, 60%, 70%, 80%, 90%, or 100%. 

By "detecting" it is intended to include determining the presence or absence, 
or quantifying the amount, of a nucleic acid molecule or polypeptide of the invention 
a substance. The term thus refers to the use of the materials, compositions, and 
methods of the present invention for qualitative and quantitative determinations. For 
example, detecting an increase in gene expression levels may include quantifying a 
change of any value between 10% and 90%, or of any value between 30% and 60%, 
or over 100%, of any of the nucleic acid molecules or polypeptides of the invention 
when compared to a control. In other embodiments, detecting an increase in gene 
expression levels may include quantifying a change of any value between 1 to 5 fold 
or more of any of the nucleic acid molecules or polypeptides of the invention when 
compared to a control. 

''Hepatocellular carcinoma" is cancer that arises from hepatocytes, the major 
cell type of the liver. It is a form of adenocarcinoma, and is the most common type 
of liver tumor. "Non-tumor" tissue refers to tissue or cells that are non-cancerous. In 
some embodiments, non-tumor tissue may include tissue or cells from a subject 
having a liver disorder, such as HBV or HCV infection, cirrhosis, exposure to 
anatoxins, etc. The phrase "suspected of being cancerous" as used herein means a 
HCC tissue sample believed by one of ordinary skill in the art to contain HCC cells. 
By "non-cancerous" or "non-tumor" is meant a tissue sample demonstrated by 
standard diagnostic or other techniques (e.g., histologic staining, microscopic 
analysis, immunoassay, etc.) to contain no HCC cells or evidence of HCC. 

A "sample" can be any organ, tissue, cell, or cell extract isolated from a 
subject, such as a sample isolated from a mammal having a hepatocellular carcinoma 
or isolated from a mammal not having a hepatocellular carcinoma or a tumor. For 
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example, a sample can include, without limitation, tissue such as liver tissue (e.g., 
from a biopsy or autopsy), cells, peripheral blood, whole blood, red cell concentrates, 
platelet concentrates, leukocyte concentrates, blood cell proteins, blood plasma, 
platelet-rich plasma, a plasma concentrate, a precipitate from any fractionation of the 
plasma, a supernatant from any fractionation of the plasma, blood plasma protein 
fractions, purified or partially purified blood proteins or other components, serum, 
semen, mammalian colostrum, milk, urine, stool, saliva, placental extracts, amniotic 
fluid, a cryoprecipitate, a cryosupernatant, a cell lysate, mammalian cell culture or 
culture medium, products of fermentation, ascitic fluid, proteins present in blood 
cells, solid tumours isolated from a mammal with a hepatocellular carcinoma, or any 
other specimen, or any extract thereof, obtained from a patient (human or animal), test 
subject, or experimental animal. A sample may also include, without limitation, 
products produced in cell culture by normal, non-tumor, or transformed cells (e.g., via 
recombinant DNA technology). A "sample" may also be a cell or cell line created 
under experimental conditions, that are not directly isolated from a subject A sample 
can also.be cell-free, artificially derived or synthesised. In some embodiments, 
samples refer to liver tissue or cells. In some embodiments, the liver tissue may be 
from a subject having a hepatocellular carcinoma; a subject infected with a hepatitis 
virus; a subject having a liver disorder e.g., cirrhosis, or a subject having a normal 
liver e.g., not diagnosed with or suspected of having a liver disorder. 

As used herein, a subject may be a human, non-human primate, rat, mouse, 
cow, horse, pig, sheep, goat, dog, cat, etc. The subject may be a clinical patient, a' 
clinical trial volunteer, an experimental animal, etc. The subject may be suspected of 
having HCC, be diagnosed with HCC, or be a control subject that is confirmed to not 
have HCC. Diagnostic methods for HCC and the clinical delineation of HCC 
diagnoses are known to those of ordinary skill in the art, and include biopsy including 
radiological biopsy by means of a radiological scan, laproscopy, or open surgical 
biopsy. 



30 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a plot showing natural patterns of gene expression differences 
between HCC tumor and non-tumor liver tissue specimens based on unsupervised 
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clustering. The plot shows the variance of expression value for each of the gene 
features across all the HCC tumor and non-tumor liver tissue specimens. The dotted 
line indicates the 500 most variable gene features. 

Figure 2 is a multidimensional scaling plot showing significant gene 
differential expression between HCC tumor and non-tumor liver tissues, and 
comparison with liver cancer cell lines (P<lxl0" 6 , approximately 1 .5-fold change). 
The plot illustrates the ability of the 218 outlier genes to separate HCC tumor 
specimens (black circles) from non-tumor liver tissue specimens (dark gray circles). 
The plot also shows how different liver cancer cell lines (light gray circles) are from 
the clinical tissue samples. 

Figures 3A-B characterize differentially expressed genes in HCC tumor 
specimens (P<lxl0-*, approximately 1.5-fold change). Figure 3A is a bar graph 
showing the chromosomal distribution of the 218 outlier genes. The dark colored and 
light shaded bars represent genes that are at least 1.5-fold up- and downregulated, 
respectively, in HCC tumors relative to non-tumor livers. Figure 3B is a bar graph 
showing the functional characterization of the outlier genes based on Gene Ontology 
and published works. 

Figure 4 is abar graph showing the expression of BMI-1 in HCC tumors as 
determined by cDNA microarray analysis. The data are presented as the level of 
expression (log base 2) in each HCC tumor specimen with respect to the 
corresponding non-tumor liver sample. 

Figures 5A-D show real-time RT-PCR analysis of IGFBP3, ERBB3, ERBB2 
and EGFR in HCC tumor samples. The gene expression patterns for (A) IGFBP3, 
(B) ERBB3, (C) ERBB2 and (D) EGFR in all the 37 HCC tumor samples and their 
corresponding non-tumor liver tissue specimens were examined. All data was 
normalized to the amount of 'housekeeping' gene PBGD and are presented as relative 
fold expression change (log base 2 ratio) in HCC tumor specimens with respect to 
their corresponding non-tumor liver counterpart. Positive value depicts 
higher expression level, while negative value depicts lower expression level in 
the tumor relative to the non-tumor specimen. 
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Figure 6 lists a panel of genes analyzed using real-time and semi-quantitative 
RT-PCR analyses, and indicating whether the analysis was conducted in non-tumor 
human tissues, and in clinical tissue samples or HCC cell lines or both. 

Figures 7A-C show gene expression analysis of ARMET: Semi-quantitative 
RT-PCR analysis (A) of non-tumor tissues and HCC cell lines and real time RT-PCR 
analysis of non-tumor tissues (B) and patient samples (C) was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues (A). The dotted line in (B) indicates mean expression value of 
four non-tumor liver tissues i.e., Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figures 8A-C show gene expression analysis of BMI-1. Semi-quantitative 
RT-PCR analysis (A) of non-tumor tissues and HCC cell lines and real time RT-PCR 
analysis of non-tumor tissues (B) and patient samples (C) was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues (A). The dotted line in (B) indicates mean expression value of 
four non-tumor liver tissues i.e., Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figures 9A-C show gene expression analysis of CRHBP. Semi-quantitative 
RT-PCR analysis (A) of non-tumor tissues and HCC cell lines and real time RT-PCR 
analysis of non-tumor tissues (B) and patient samples (C) was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues (A). The dotted line in (B) indicates mean expression value of 
four non-tumor liver tissues i.e., Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figures 10A-C show gene expression analysis of CSTB. Semi-quantitative 
RT-PCR analysis (A) of non-tumor tissues and HCC cell lines and real time RT-PCR 
analysis of non-tumor tissues (B) and patient samples (C) was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues (A). The dotted line in (B) indicates mean expression value of 
four non-tumor liver tissues i.e, Fetal/F, Fetal/M, Adult/F, AduhVM. 

Figures 11A-C show gene expression analysis of DPT. Semi-quantitative 
RT-PCR analysis (A) of non-tumor tissues and HCC cell lines and real time RT-PCR 
analysis of non-tumor tissues (B) and patient samples (C) was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
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tumor human tissues (A). The dotted line in (B) indicates mean expression value of 
four non-tumor liver tissues i.e., Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figures 12A-B show gene expression analysis of ERBB3. Real time RT-PCR 
analysis of non-tumor tissues (A) and patient samples (B) was performed. The dotted 
line in (A) indicates mean expression value of four non-tumor liver tissues i.e., 
Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figures 13A-B show gene expression analysis of EZH2. Real time RT-PCR 
analysis of non-tumor tissues (A) and patient samples (B) was performed. The dotted 
line in (A) indicates mean expression value of four non-tumor liver tissues i.e., 
Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figures 14A-B show gene expression analysis of GPC3. Real time RT-PCR 
analysis of non-tumor tissues (A) and patient samples (B) was performed. The dotted 
line in (A) indicates mean expression value of four non-tumor liver tissues i.e., 
Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figures 15A-B show gene expression analysis of HDGF. Real time RT-PCR 
analysis of non-tumor tissues (A) and patient samples (B) was performed. The dotted 
line in (A) indicates mean expression value of four non-tumor liver tissues i.e., 
Fetal/F, Fetal/M Adult/F, Adult/M 

Figures 16A-B show gene expression analysis of MDK. Real time RT-PCR 
analysis of non-tumor tissues (A) and patient samples (B) was performed. The dotted 
line in (A) indicates mean expression value of four non-tumor liver tissues i.e., 
Fetal/F, Fetal/M Adult/F, Adult/M 

Figure 17 shows gene expression analysis of D123. Semi-quantitative RT- 
PCR analysis of non-tumor tissues and HCC cell lines was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues. 

Figure 18 shows gene expression analysis of FU10326. Semi-quantitative 
RT-PCR analysis of non-tumor tissues and HCC cell lines was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues. 

Figure 19 shows gene expression analysis of ICA-1 A. Semi-quantitative RT- 
PCR analysis of non-tumor tissues and HCC cell lines was performed. GAPDH 
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expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues. 

Figure 20 shows gene expression analysis of LASP1. Semi-quantitative RT- 
PCR analysis of non-tumor tissues and HCC cell lines was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues. 

Figure 21 shows gene expression analysis of PODXL. Semi-quantitative RT- 
PCR analysis of non-tumor tissues and HCC cell lines was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues. 



DETAILED DESCRIPTION OF THE INVENTION 
Phenotypic changes in cancer may be due to cellular changes at the nucleotide 
level. Thus, some genes may be expressed, overexpressed, or under-expressed in 
tumor cells relative to non-tumor cells. However, a wide variation exists in gene 
expression patterns among cancer patients, including HCC patients. Therefore, 
examining the regulation or expression of a single gene or target, or eVen of multiple 
genes or targets whose regulation or expression vary across different HCC tumors, 
may be insufficient for accurate diagnosis or treatment of HCC or for screening of 
HCC therapeutics. Selecting a set of differentially expressed HCC genes, nucleic acid 
molecules, and/or polypeptides, assists in predictable and accurate diagnosis and 
therapy, and design of efficacious therapeutics. 

The invention provides, in part, nucleic acid molecules and polypeptides that 
are differentially expressed in HCC cells, when compared to non-HCC tissue, e.g., 
liver or serum. Thus, the invention provides, in part, molecular markers for HCC 
derived from the analysis of global changes in gene expression ("gene expression 
profiles") between HCC tissue and non-HCC tissue. More specifically, cDNA 
microarrays were used to examine the global cellular changes in matched pairs of 
HCC tumor and non-tumor tissues of patients diagnosed with HCC. In addition, gene 
expression patterns between primary HCC tumors and liver cancer cell lines were 
examined for possible biological variation. 
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The nucleic acid molecules or polypeptides provided by the invention, as well 
as subsets thereof, serve as molecular markers that may be used for example for HCC 
diagnosis; to assess HCC progression or regression; to access the efficacy and/or 
toxicity of HCC therapeutics; and/or to identify candidate compounds for HCC 
therapy, with high predictive accuracy. The genes lists identified permit rapid, simple, 
and reproducible screening of a variety of HCC samples by, for example, nucleic acid 
microarray hybridization or protein expression technology to determine the 
expression of the specific genes, or by other means such as differential display, gel 
electrophoresis, genome mismatch scanning, representational discriminate analysis, 
clustering, transcript imaging, etc. used singly or in combination. Thus, the selected 
nucleic acid molecules or polypeptides of the invention define standard and 
reproducible differential expression patterns against which to compare the expression 
pattern in a variety of tissue or cells, e.g., HCC tissue or cells or serum, obtained by 
biopsy, autopsy, or from cell lines and/or in vitro treatment or assays. The selected 
nucleic acid molecules or polypeptides of the invention and subsets thereof provide 
reliable detection of HCC cells or tissue, with reduction or elimination of false 
positives or false negatives. In some embodiments, the invention provides composite 
sets of mscriminator genes for use as general or global HCC tumor markers. In some 
embodiments, the nucleic acid molecules or polypeptides of the invention may be 
used to assess the suitability of a HCC cell line for use as a model for HCC, as gene 
expression profiles may vary between primary HCC tumors and HCC cell lines. 

Various alternative embodiments and examples of the invention are described 
herein. These embodiments and examples are illustrative and should not be construed 
as limiting the scope of the invention. 



Nucleic Acid Molecules Polypeptide;. And Test Cnmp rmnHc, 

Compounds according to the invention include, without limitation, molecules 
substantially identical to the nucleic acid molecules of Tables 1 through 4 (e.g., BMI- 
1, ARMET, CRHBP, CSTB, DPT, ERBB3, EZH2, GPC3, HDGF, MDK, D123, 
FLJ10326, ICAP-1A, LASP1, PODXL) and complements, analogs, fragments, and 
variants thereof, including, for example, the polypeptides described herein that are 
encoded by the nucleic acid molecules of Tables 1 through 4, as well as homologs and 
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fragments thereof. In some embodiments of the invention, compounds of the 
invention include antibodies that specifically bind to polypeptides encoded by the 
nucleic acid molecules of Tables 1 through 4. An antibody "specifically binds" an 
antigen when it recognises and binds the antigen, for example, a polypeptide encoded 
by any of the nucleic acid molecules described herein, but does not substantially 
recognise and bind other reference molecules in a sample, for example, a polypeptide 
that is encoded by a nucleic acid molecule that is not substantially identical to any of 
the nucleic acid molecules described herein. Such an antibody has, for example, an 
affinity for the antigen which is at least 10, 100, 1000 or 10000 times greater than the 
affinity of the antibody for another reference molecule in a sample. 

A "substantially identical" sequence is an amino acid or nucleotide sequence 
that differs from a reference sequence only by one or more conservative substitutions, 
as discussed herein, or by one or more non-conservative substitutions, deletions, or 
insertions located at positions of the sequence that do not destroy the biological 
function of the amino acid or nucleic acid molecule, or that do not destroy the 
detectability (e.g., by hybridization or specific binding) of the amino acid or nucleic 
acid molecule. Such a sequence can be any integer from 10% to 99%, or more 
generally at least.10%, 20%, 30%, 40%, 50, 55% or 60%, or at least 65%, 75%, 80%, 
85%, 90%, or 95%, or as much as 96%, 97%, 98%, or 99% identical when optimally 
aligned at the amino acid or nucleotide level to the sequence used for comparison 
using, for example, the Align Program (Myers and Miller, CABIOS, 1989, 4:1 1-17) 
or FASTA. For polypeptides, the length of comparison sequences may be at least 2, 
5, 10, or 15 amino acids, or at least 20, 25, or 30 amino acids. In alternate 
embodiments, the length of comparison sequences maybe at least 35, 40, or 50 amino 
acids, or over 60, 80, or 100 amino acids. For nucleic acid molecules, the length of 
comparison sequences maybe at least 5, 10, 15, 20, or 25 nucleotides, or at least 30, 
40, or 50 nucleotides. In alternate embodiments, the length of comparison sequences 
maybe at least 60, 70, 80, or 90 nucleotides, or over 100, 200, or 500 nucleotides. 
Sequence identity can be readily measured using publicly available sequence analysis 
software (e.g., Sequence Analysis Software Package of the Genetics Computer Group, 
University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, 
Wis. 53705, or BLAST software available from the National Library of Medicine, or 
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as described herein). Examples of useful software include the programs Pile-up and 
PrettyBox. Such software matches similar sequences by assigning degrees of 
homology to various substitutions, deletions, substitutions, and other modifications. 

Alternatively, or additionally, two nucleic acid sequences maybe 
"substantially identical" if they hybridize under high stringency conditions. In some 
embodiments, high stringency conditions are, for example, conditions that allow 
hybridization comparable with the hybridization that occurs using a DNA probe of at 
least 500 nucleotides in length, in a buffer containing 0.5 M NaHP0 4 , pH 7.2, 7% 
SDS, 1 mM EDTA, and 1% BSA (fraction V), at a temperature of 65°C, or abuffer 
containing 48% formamide, 4.8x SSC, 0.2 M Tris-Cl, pH 7.6, lx Denhardfs solution, 
10% dextran sulfate, and 0.1% SDS, at a temperature of 42°C. (These are typical 
conditions for high stringency northern or Southern hybridizations.) Hybridizations 
may be carried out over a period of about 20 to 30 minutes, or about 2 to 6 hours, or 
about 10 to 15 hours, or over 24 hours or more. High stringency hybridization is also 
relied upon for the success of numerous techniques routinely performed by molecular 
biologists, such as high stringency PCR, DNA sequencing, single strand 
conformational polymorphism analysis, and in situ hybridization. In contrast to 
northern and Southern hybridizations, these techniques are usually performed with 
relatively short probes (e.g., usually about 16 nucleotides or longer for PCR or 
sequencing and about 40 nucleotides or longer for in situ hybridization). The high 
stringency conditions used in these techniques are well known to those skilled in the 
art of molecular biology, and examples of them can be found, for example, in Ausubel 
et al., Current Protocols in Molecular Biology, John Wiley & Sons, New York, N.Y., 
1998, which is hereby incorporated by reference. 

A 'Varianf ' is a nucleic acid molecule that is a recognized variation of a 
nucleic acid molecule or expression product thereof. Splice variants may be 
determined for example by using computer programs, e.g, BLAST. Allelic variants 
have in general a high percent identity to the nucleic acid molecule of interest "Single 
nucleotide polymorphism" (SNP) refers to a change in a single base as a result of a 
substitution, insertion or deletion. The change may be conservative (purine for purine) 
or non-conservative (purine to pyrimidine) and may or may not result in a change in 
an encoded amino acid. An "analog" is a nucleic acid molecule or polypeptide that 
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has been subjected to a chemical modification. Nucleic acid analogs can include 
substitution of a non-traditional base such as queosine or of an analog such as 
hypoxanthine, or other substitutions known in the art. Analogs in general retain the 
biological activities of the naturally occurring molecules but may confer advantages 
such as longer lifespan or enhanced activity. By "complementary" or "complement" is 
meant that two nucleic acids, e.g., DNA or RNA, contain a sufficient number of 
nucleotides which are capable of forming Watson-Crick base pairs to produce a 
region of double-strandedness between the two nucleic acids. Thus, adenine in one 
strand of DNA or RNA pairs with thymine in an opposing complementary DNA 
strand or with uracil in an opposing complementary RNA strand. It will be understood 
that each nucleotide in a nucleic acid molecule need not form a matched Watson- 
Crick base pair with a nucleotide in an opposing complementary strand to form a 
duplex. A nucleic acid molecule is "complementary" to another nucleic acid 
molecule, or is a "complement" of that other nucleic acid molecule, if it hybridizes, 
under conditions of high stringency, with the second nucleic acid molecule. The 
"complement" of a nucleic acid molecule of Tables 1 through 4 may in some 
embodiments include a nucleic acid molecule that is complementary over the full 
length of the sequence of a nucleic acid molecule of Tables 1 through 4. A 
•fragment" may be any portion of a nucleic acid molecule or polypeptide as described 
herein that is capable of being differentially expressed or detected in an assay or 
screening method according to the invention. 

Various genes and nucleic acid sequences of the invention may be 
recombinant sequences. The term "recombinant" means mat something has been 
recombined, so that when made in reference to a nucleic acid construct the term refers 
to a molecule that is comprised of nucleic acid sequences that are joined together or 
produced by means of molecular biological techniques. The term "recombinant" when 
made in reference to a protein or a polypeptide refers to a protein or polypeptide 
molecule which is expressed using a recombinant nucleic acid construct created by 
means of molecular biological techniques. The term "recombinant" when made in 
reference to genetic composition refers to a gamete or progeny with new 
combinations of alleles that did not occur in the parental genomes. Recombinant 
nucleic acid constructs may include a nucleotide sequence which is ligated to, or is 
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manipulated to become ligated to, a nucleic acid sequence to which it is not ligated in 
nature, or to which it is ligated at a different location in nature. Referring to a nucleic 
acid construct as 'recombinant' therefore indicates that the nucleic acid molecule has 
been manipulated using genetic engineering, i.e. by human intervention. 
Recombinant nucleic acid constructs may for example be introduced into a host cell 
by transformation. Such recombinant nucleic acid constructs may include sequences 
derived from the same host cell species or from different host cell species, which have 
been isolated and reintroduced into cells of the host species. Recombinant nucleic 
acid construct sequences may become integrated into a host cell genome, either as a 
result of Ihe original transformation of the host cells, or as the result of subsequent 
recombination and/or repair events. 

As used herein, "heterologous" in reference to a nucleic acid or protein is a 
molecule that has been manipulated by human intervention so that it is located in a 
place other than the place in which it is naturaUy found. For example, a nucleic acid 
sequence from one species may be introduced into the genome of another species, or a 
nucleic acid sequence from one genomic locus may be moved to another genomic or 
extrachromasomal locus in the same species. A heterologous protein includes, for 
example, a protein expressed from a heterologous coding sequence or a protein 
expressed from a recombinant gene in a cell that would not naturally express the 
protein. 

A compound is "substantially pure" when it is separated from the components 
that naturally accompany it. Typically, a compound is substantially pure when it is at 
least 10%, 20%, 30%, 40%, 50%, or 60%, more generally 70%, 75%, 80%, or 85%, 
or over 90%, 95%, or 99% by weight, of the total material in a sample. Thus, for 
example, a polypeptide that is chemically synthesised, produced by recombinant 
technology, isolated by known purification techniques, will be generally be 
substantially free from its naturally associated components. A substantially pure 
compound can be obtained, for example, by extraction from a natural source; by 
expression of a recombinant nucleic acid molecule encoding a polypeptide compound; 
or by chemical synthesis. Purity can be measured using any appropriate method such 
as column chromatography, gel electrophoresis, HPLC, etc. A nucleic acid molecule 
is substantially pure or "isolated" when it is not immediately contiguous with (i.e., 
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covalently linked to) the coding sequences with which it is normally contiguous in the 
naturally occurring genome of the organism from which the DNA of the invention is 
derived. Therefore, an "isolated" gene or nucleic acid molecule is intended to mean a 
gene or nucleic acid molecule which is not flanked by nucleic acid molecules which 
normally (in nature) flank the gene or nucleic acid molecule (such as in genomic 
sequences) and/or has been completely or partially purified from other transcribed 
sequences (as in a cDNA or RNA library). For example, an isolated nucleic acid of 
the invention may be substantially isolated with respect to the complex cellular milieu 
in which it naturally occurs. In some instances, the isolated material will form part of 
a composition (for example, a crude extract containing other substances), buffer 
system or reagent mix. In other circumstance, the material may be purified to essential 
homogeneity, for example as deterrnined by PAGE or column chromatography such 
as HPLC. The term therefore includes, e.g„ a recombinant nucleic acid incorporated 
into a vector, such as an autonomously replicating plasmid or virus; or into the 
genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule 
(e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction 
endonuclease treatment) independent of other sequences. It also includes a 
recombinant nucleic acid which is part of a hybrid gene encoding additional 
polypeptide sequences. Preferably, an isolated nucleic acid comprises at least about 
40%, 500/0, 60%, 70%, 80%, 90%, 95%, or 99% (on a molar basis) of all. 
macromolecular species present Thus, an isolated gene or nucleic acid molecule can 
include a gene or nucleic acid molecule which is synthesized chemically or by 
recombinant means. Recombinant DNA contained in a vector are included in the 
definition of "isolated" as used herein. Also, isolated nucleic acid molecules include 
recombinant DNA molecules in heterologous host cells, as well as partially or 
substantially purified DNA molecules in solution. In vivo and in vitro RNA 
transcripts of the DNA molecules of the present invention are also encompassed by 
"isolated" nucleic acid molecules. Such isolated nucleic acid molecules are useful in 
the manufacture of the encoded polypeptide, as probes for isolating homologous 
sequences (e.g., from other mammalian species), for gene mapping (e.g., by in situ 
hybridization with chromosomes), or for detecting expression of the gene in tissue 
(e.g, human tissue, such as peripheral blood), such as by Northern blot analysis. 
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Polypeptide compounds can be prepared by, for example, replacing, deleting, 
or inserting an amino acid residue at any position of a peptide or a peptide analog, for 
example, a peptide as described herein, with other conservative amino acid residues, 
i.e., residues having similar physical, biological, or chemical properties. It is well 
known in the art that some modifications and changes can be made in the structure of 
a polypeptide without substantially altering the biological function of that peptide, to 
obtain a biologically equivalent polypeptide, ha one aspect of the invention, 
polypeptides of the present invention also extend to biologically equivalent peptides 
that differ from a portion of the sequence of the polypeptides of the present invention 
by conservative amino acid substitutions. As used herein, the term "conserved amino 
acid substitutions" refers to the substitution of one amino acid for another at a given 
location in the peptide, where the substitution can be made without substantial loss of 
the relevant function. In making such changes, substitutions of like amino acid 
residues can be made on the basis of relative similarity of side-chain substituents, for 
example, their size, charge, hydrophobicity, hydropMlicity, and the like, and such 
substitutions may be assayed for their effect on the function of the peptide by routine 
testing. Conservative changes can also include the substitution of a chemically 
derivatised moiety for a non-derivatised residue, by for example, reaction of a 
functional side group of an amino acid. Peptides or peptide analogs can be 
synthesised by standard chemical techniques, for example, by automated synthesis 
using solution or solid phase synthesis methodology. Automated peptide synthesisers 
are commercially available and use techniques well known in the art. Peptides and 
peptide analogs can also be prepared using recombinant DNA technology using 
standard methods such as those described in, for example, Sambrook, et al. 
(Molecular Cloning: A Laboratory Manual. 2 nd ed., Cold Spring Harbor Laboratory, 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989) or Ausubel et 
al. (Current Protocols in Molecular Biology, John Wiley & Sons, 1994). Computer 
programs such as LASERGENE software (DNASTAR, Madison Wis.), 
MACVECTOR software (Genetics Computer Group, Madison Wis.) and RasMol 
software (www.umass.edu/microbio/rasmol) may be used to determine which and 
how many amino acid residues in a particular portion of the protein may be 
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substituted, inserted, or deleted without abolishing biological or immunological 
activity. 

Monitoring changes in gene expression may also be advantageous when 
screening candidate HCC therapeutics. Often candidate compounds are screened and 
prescreened for the ability to interact with a major target without regard to other 
effects they may have on cells or in the subject to be treated, such as toxicity, which 
prevent the development and use of the potential compound. Thus, the methods of the 
invention may be used to identify candidate compounds suitable for HCC therapy. 

In general, candidate or test compounds are identified from large libraries of 
both natural products or synthetic (or semi-synthetic) extracts or chemical libraries 
according to methods known in the art. Those skilled in the field of drug discovery 
and development will understand that the precise source of test extracts or compounds 
is not critical to the methods of the invention. Accordingly, virtually any number of 
chemical extracts or compounds can be screened using the exemplary methods 
described herein. Examples of such extracts or compounds include, but are not limited 
to, plant-, fungal-, prokaryotic- or animal-based extracts, fermentation broths, and 
synthetic compounds, as well as modification of existing compounds. Numerous 
methods are also available for generating random or directed synthesis (e.g., semi- 
synthesis or total synthesis) of any number of chemical compounds, including, but not 
limited to, saccharide-, lipid-, peptide-, and nucleic acid-based compounds. Synthetic 
compound libraries are commercially available. Alternatively, libraries of natural 
compounds in the form of bacterial, fungal, plant, and animal extracts are 
commercially available from a number of sources, including Biotics (Sussex, UK), 
Xenova (Slough, UK), Harbor Branch Oceanographic Institute (Ft Pierce, FL, USA) 
and PharmaMar, MA, USA Furthermore, if desired, any library or compound* 
readily modified using standard chemical, physical, or biochemical methods. 
Candidate compounds useful for treating HCC may also be identified by assessing 
variations in the expression of one or more HCC markers, from Tables 1 through 4, 
prior to and after contacting HCC cells or tissues with candidate pharmacological ' 
agents for the treatment of HCC. The cells may be grown in culture (e.g. from a HCC 
cell line), or may be obtained from a subject, (e.g. in a clinical trial of candidate 
pharmaceutical agents to treat HCC). Alterations in expression of one or more of 
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HCC nucleic acid markers (drug targets), in HCC cells or tissues tested before and 
after contact with a candidate pharmacological agent to treat HCC, indicate 
progression, regression, or stasis of the HCC thereby indicating efficacy of candidate 
agents and concomitant identification of candidate compounds for therapeutic use in 
HCC Candidate compounds may also be screened for toxicity, specificity, etc. 

When a crude extract is found to modulate expression levels of any of the 
nucleic acid molecules or polypeptides of the invention, further fractionation of the 
positive lead extract is necessary to isolate chemical constituents responsible for the 
observed effect. Thus, the goal of the extraction, fractionation, and purification 
process is the careful characterization and identification of a chemical entity within 
the crude extract having the modulatory activities. The same assays described herein 
for the detection of activities in mixtures of compounds can be used to purify the 
active component and to test derivatives thereof Methods of fractionation and 
purification of such heterogeneous extracts are known in the art If desired, 
compounds shown to be useful agents for treatment are chemically modified 
according to methods known in the art Compounds identified as being of therapeutic, 
prophylactic, diagnostic, or other value maybe subsequently analyzed using HCC cell 
lines or a animal model for HCC. 

Arrays. Microarravs, l ibraries. Databases. And Kit* 

In one aspect, the invention provides nucleic acid or polypeptide arrays and 
biological assays thereof. Arrays refer to ordered arrangements of at least two nucleic 
acid molecules or polypeptides on a substrate, which can be any rigid or semi-rigid 
support to which two nucleic acid molecules or polypeptides may be attached. In 
some embodiments, a substrate may be a liquid medium. Substrates include 
membranes, filters, chips, slides, wafers, fibers, beads, gels, capillaries, plates, 
polymers, and microparticles etc. Because the nucleic acid molecules or polypeptides 
are located at specified locations on the substrate, the hybridization or binding 
patterns and intensities create a unique expression profile, which can be interpreted in 
terms of expression levels of particular genes and can be correlated with HCC 
progression, regression, therapy, etc. 

High density nucleic acid or polypeptide arrays are also referred to as 
"microarrays," and may for example be used to monitor the presence or level of 
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expression of a large number of genes or polypeptides or for detecting sequence 
variations, mutations and polymorphisms. Arrays and microarrays generally require a 
solid support (for example, nylon, glass, ceramic, plastic, silica, aluminosilicates, 
borosilicates, metal oxides such as alumia and nickel oxide, various clays, 
5 nitrocellulose, etc.) to which the nucleic acid molecules or polypeptides are attached 
in a specified 2-dimensional arrangement, such that the pattern of hybridization or 
binding to a probe is easily determinable. In some embodiments, at least one of the 
nucleic acid molecules or polypeptides is a control, standard, or reference molecule, 
such as a housekeeping gene or portion thereof (e.g., PBGD, GAPDH), that may 
10 assist in the normalization of expression levels or assist in the deternuning of nucleic 
acid quality and binding characteristics; reagent quality and effectiveness; 
hybridization success; analysis thresholds and success, etc. 

Nucleic acid molecules or polypeptide probes may be derived from 
compounds as described herein for example in Tables 1 through 4, and the 
15 compositions of the invention may be used as elements on a microarray to analyze 

gene expression profiles. For the purpose of such arrays, "nucleic acids" may include 
any polymer or oligomer of nucleosides or nucleotides (polynucleotides or 
oligonucleotides), which include pyrimidine and purine bases, preferably cytosine, 
thymine, and uracil, and adenine and guanine, respectively. A variety of methods are 
known for making and using microarrays, as for example disclosed in Cheung, V.G., 
et al. 1999; Lipshutz, R.J., et al. 1999; Bowtell, D.D.L., 1999; and, Schweitzer, B., 
2002; G. MacBeath and S. L. Schreiber, 2000.; all of which are incorporated herein 
by reference. In some embodiments, the microarray substrate may be coated with a 
compound to enhance synthesis of the nucleic acid molecule on the substrate as 
25 disclosed in, for example, U.S. Pat. No. 4,458,066. In some embodiments, probes 
may be synthesized directly on the substrate in a predetermined ordered arrangement 
Methods for storing, querying and analyzing microarray data have for example been 
disclosed in, for example, United States Patent No. 6,484,183; United States Patent 
No. 6,188,783; and Holloway, A.J., 2002; each of which is incorporated herein by 
30 reference. In an alternative aspect, the invention provides nucleic acid or polypeptide 
microarrays including a number of distinct and selected nucleic acid or polypeptide 
array sequences of the invention. The number of distinct sequences may for example 
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be any integer between 2 and 1 x 10 s , such as at least 10 2 , 10 3 , 10 4 , or 10 5 . The size of 
the distinct sequences may vary depending on the intended use, and can be 
determined by a skilled person. For example, the nucleic acid sequences may range 
from 1 5 to 5000 bases or more, or any integer between this range. 

Microarrays may also be used to examine the expression of all the genes in a 
tissue or cell such as a liver cell or a HCC cell. Thus, the nucleic acid molecules of 
Tables 1 through4maybe attached toasolid support, hybridized with single stranded 
detectably-labeled cDNAs (corresponding to a "complementary" orientation), and 
quantified using an appropriate method such that a signal is detected at each location 
at which hybridization has taken place. The intensity of the signal would then reflect 
the amount of gene expression. Similarly, protein microarrays maybe used according 
to methods known in the art. Comparison of results from different cells or tissue, for 
example, hepatocellular carcinoma cells or tissue, hepatitis virus infected cells or 
tissue, non-tumor cells or tissue, normal cells or tissue, cirrhotic liver cells or tissue, 
or any combination thereof would elucidate differing levels of expression of specified 
genes from the different sources. 

In one aspect of the invention, libraries may be constructed of bacterial strains 
each of which bears a plasmid expressing a different nucleic acid molecule of any one 
or more of Tables 1 through 4 under control of an inducible promoter. ORFsare 
amplified using PCR and cloned into a vector that enables their expression as N- 
terminal his-tagged polypeptides. These amplicons are also used to construct 
hybridization microarrays and enable targeted gene disruption, reducing expenses. A 
suitable expression host (e.g. E. coli) is selected, and genes encoding particular 
biochemical activities are identified by screening arrayed pools of his-tagged proteins 
as described previously (Martzen, M.R., et al., 1999). 

The invention also provides databases including the nucleic acid and 
polypeptide sequences described herein, as well as gene expression information in 
various cancerous and non-cancerous liver and liver cell line samples. Such databases 
may be used to access information that may aid in diagnosis, prognosis, or other 
HCC-related methods of the invention. A database as used herein includes any 
electronic form of the compounds (e.g., nucleic acid and polypeptide sequences) of 
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the invention, and information regarding these compounds, and includes computer 
readable media and any suitable form for storing the information. 

The invention also provides kits including for example one or more of the 
nucleic acid molecules or polypeptides of the invention (or complements, analogs, 
variants, or fragments thereof), an appropriate buffer, appropriate reagents for 
detection, and appropriate controls. For example, a kit may include probes or primers 
(which may or may not be detectably labeled) suitable for hybridization or 
amplification, or may include antibodies or ligands suitable for specific binding. A 
kit may also include written or electronic instructions. 

Diagnostic and Other Uses 

A wide variety of detectable labels and conjugation techniques are known by 
those skilled in the art and may be used in various nucleic acid molecule and 
polypeptide assays to diagnose HCC. The nucleic acid molecules, proteins, 
antibodies and other compounds according to the invention may be labeled for 
purposes of assay by joining them, either covalently or noncovalently, with a 
detectable label. By "detectably labeled" is meant any means for marking and 
identifying the presence of a molecule, e.g., an. oligonucleotide probe or primer, a 
gene or fragment thereof, a cDNA molecule, or a polypeptide. Methods for 
detectably-labeling a molecule are weU known in the art and include, without 
limitation, radioactive labeling (e.g., with an isotope such as 32 P or 35 S) and 
nonradioactive labelling such as, enzymatic labeling (for example, using horseradish 
peroxidase or alkaline phosphatase), cheinUuminescent labeling, fluorescent labeling 
(for example, using fluorescein), bioluminescent labeling, or antibody detection of a 
ligand attached to the probe. Also included in this definition is a molecule that is 
detectably labeled by an indirect means, for example, a molecule that is bound with a 
first moiety (such as biotin) that is, in turn, bound to a second moiety that may be 
observed or assayed (such as fluorescein-labeled streptavidin). Labels also include 
digoxigenin, luciferases, and aequorin. Synthesis of labeled molecules performed by 
using labels such as 32 P-dCTP, Cy3-dCTP or Cy5-dCTP or 35 S-methionine. 
Compounds according to the invention may also be directly labeled by chemical 
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conjugation to amines, thiols and other groups present in the molecules using reagents 
such as BIODIPY or FITC (Molecular Probes, Eugene, OR, USA). 

Compounds, compositions, and reagents according to the invention may be 
used to detect and quantify differential gene expression; absence, presence, or excess 
expression of nucleic acid molecules (e.g., mRNAs) or polypeptides; or to monitor 
nucleic acid molecule (e.g., mRNA) or polypeptide levels during therapeutic 
intervention in subjects with HCC. The compounds, compositions, and reagents 
according to the invention can also be utilized as markers of HCC treatment efficacy 
over a period ranging from days to months to years. The diagnostic assays may use 
hybridization, amplification, ligand binding, or antibody technologies to compare 
gene expression in a biological sample from a subject to reference samples or 
standards, or to cancerous and non-cancerous samples from the subject, in order to 
detect altered gene expression. Qualitative or quantitative methods for this 
comparison are known in the art, and any suitable method may be used 

In order to provide a basis for the diagnosis of HCC, a non-tumor or standard 
gene expression profile may be established. This may be accomplished by combining 
a biological sample taken from normal or non-tumor subjects or from non-cancerous 
tissue from a subject with HCC, with a probe under conditions for hybridization or 
amplification. Standard hybridization may be quantified by comparing the values 
obtained using non-tumor subjects or non-cancerous tissue with values from an 
experiment in which a known amount of a substantially purified target sequence is 
used. Standard values obtained in this manner may be compared with values obtained 
from samples from patients who are symptomatic for HCC. Deviation from standard 
values toward those associated with HCC is used to diagnose HCC. Such assays may 
also be used to monitor the efficacy of a particular HCC therapy in animal studies, in 
clinical trials, or to monitor the treatment of an individual patient or groups of 
patients. Once the presence of HCC is established in a subject and a treatment 
protocol is initiated, assays according to the invention may be repeated on a regular 
basis to determine if the level of expression in the subject begins to approximate that 
which is observed in a non-tumor subject, and to monitor the progression of HCC in 
the subject. The results obtained from successive assays may be used to show the 
efficacy of treatment over a period ranging from several days to months. 
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Compounds, compositions, and reagents (e.g., microarrays) according to the 
invention may be used to monitor the progression or regression of HCC. The 
dififerences in gene expression between healthy and diseased tissues or cells can be 
assessed and cataloged. By analyzing changes in patterns of gene expression, HCC 
5 can be diagnosed at earlier stages before the subject is symptomatic. Similarly, by 
analyzing gene expression profiles and changes therein, prognoses maybe 
formulation, and therapies may be designed. Progression or regression of HCC may 
be determined by comparison of two or more different HCC samples taken at multiple 
different times from a subject (e.g., at least 2, 3, 4, or 5 or more time points) over the 

10 course of days to months. For example, progression or regression may be evaluated 
by assessments of expression of sets of two or more, or as many as all, of the nucleic 
acid molecules of Tables 1 through 4 in a HCC tissue sample from a subject before, 
during, and following treatment for HCC. 

Compounds, compositions, and reagents (e.g., microarrays) according to the 

1 5 invention can also be used to monitor the efficacy of a therapy. For therapies with 
known side effects, compounds, compositions, and reagents (e.g., microarrays) 
according to the invention may be employed to improve the therapeutic regimen. For 
example, dosages that causes changes in gene profiles that represent efficacious 
treatment maybe determined, and expression profiles associated with the onset of 

20 undesirable side effects may be avoided. This approach may be more sensitive and 
rapid than waiting for the subject to show inadequate improvement, or to manifest 
side effects, before altering the course of treatment. In another aspect of the invention, 
pre- and post-treatment alterations in expression of two or more sets of HCC nucleic 
acid molecules in HCC cells or tissues may be used to assess treatment parameters 

25 including, but not limited to: dosage, method of administration, timing of 
administration, and combination with other known treatments for HCC. 

In some aspects, any one or more of the compounds provided herein may be 
used in therapeutic applications. For example, selected compounds provided herein 
maybe used as therapeutic targets for the identification of agents, that modulate their 

30 expression levels and/or activity, that may be used to treat HCC. 
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EXAMPLES 

Experim ental Procedures 

RNA isolation, RNA amplification and cDNA microarray hybridization 

Paired samples of tumor and corresponding non-tumor tissues were obtained 
from resected liver specimens from thirty-seven (37) patients who had been diagnosed 
with hepatitis B virus (HBV>associated HCC and had undergone curative liver 
resection. A validation tissue set composed of 58 liver biopsy samples from an 
independent cohort of twenty-nine (29) patients, who also had HBV-associated HCC 
and had undergone liver resection, was used. Informed consent from the patient and 
institutional research and ethics committee approval were obtained. Tissues were snap 
frozen in liquid nitrogen and stored at -150°C. A small section of each specimen was 
sampled and total RNA was isolated from tissues using TRIZOL® reagent (Life 
Technologies, Bethesda, MD, USA) according to the manufacturer's instructions. The 
integrity of the RNA specimen was verified by gel electrophoresis. 

The human liver cancer cell lines used in this study were: PLC/PRF/5, 
HA22T,Huhl, Huh4, Tong, Hep3B, SNU182, SNU449, SNU475, HepG2, 
Huh6, Huh7, SKHepl, and Mahlavu. All cell lines were cultured under 
conditions recommended by the American Type Culture Collection (VA, USA). 
Total RNA was extracted using TRIZOL® reagent (Life Technologies, Bethesda, MD, 
USA) according to the manufacturer's instructions. 

RNA was linearly amplified using a procedure modified from Eberwine and 
coworkers (Eberwine et al, 1992). Briefly, total RNA was reverse transcribed using a 
63-nucleotide synthetic primer containing the T7 polymerase binding site 5'- 

GGCCAGTGAATTGTAATACGACTCACTATAGGGAGGCGG(T)24-3'. Full- 
length double-stranded cDNA synthesis was accomplished in the presence of R coli 
DNA polymerase I, DNA ligase and RNase H. The cDNA was made blunt-ended 
with T4 DNA polymerase, and purified by extraction in a mixture of phenol, 
chloroform and isoamyl alcohol, and precipitation in the presence of ammonium 
acetate and ethanol. Purified double-stranded cDNA was then transcribed with T7 
polymerase (T7 Megascript® Kit, Ambion) to yield linearly amplified antisense RNA, 
which was subsequently purified with RNeasy® mini-columns (Qiagen). Human 
universal reference RNA (Stratagene, La Jolla, CA), including total RNA from 10 
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different human cell lines, was amplified and used as the reference for cDNA 
. microarray analysis. 

Approximately 9000 human cDNA features (Incyte Genomics, Palo Alto, CA, 
USA) were spotted onto poly-L-lysine coated slides using OmniGrid® arrayer 
(GeneMachines). Probes were generated from the amplified RNA material and 
hybridized to the chip as described elsewhere (Sotiriou et al, 2002). Briefly, 4 ug 
amplified RNA was reverse-transcribed using random hexamers and directly 
labeled with Cy3 -conjugated dUTP (reference RNA) or Cy5-conjugated dUTP 
(sample RNA). Hybridization was performed in the presence of 25% formamide 
and 5X SSC for 16h at 42°C. Slides were scanned with an Axon 4000b laser scanner 
(Axon Instruments) after washing and drying. To minimize the effects of labeling 
biases, reciprocal dye swap labeling experiments were performed for each sample. 

Data analysis 

The 37 paired HCC tumor and non-tumor liver samples, and liver cancer 
cell lines were processed oh the microarray on two separate prints, and the validation 
tissue set was processed on a third print. Raw data was analyzed on GenePix analysis 
software version 3.0 (Axon Instruments, Burlingame, CA, USA) and uploaded to a 
relational database maintained by the Center for Information Technology at the 
National Institutes of Health (ie. MADB). The cDNA clones used for the microarray 
are represented by their UniGene identifiers. For each array, the 
logarithmic expression ratio for a spot on each array was normalized by subtracting 
the median logarithmic ratio for the same array. Data was filtered to exclude spots 
with a size of less than 25 urn and any poor quality or missing spots. Since the 
correlation of the overall data from reciprocal labeling was good, values obtained 
from reciprocal labeling experiments were averaged. In addition, any gene features 
that were found to be absent from the data in more than 50% of patient samples in 
either set of arrays were excluded, and gene features that were common in data from 
the array print sets were retained. Application of these filters resulted in the inclusion 
of 8716 of the total 9127 features in subsequent analysis. Statistical comparison of 
genes between HCC tumors and non-tumors was performed by the Wilcoxon rank- 
sum non-parametric test To evaluate gene expression patterns, hierarchical clustering 
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using one minus Pearson's correlation metric and average linkage (Eisen et al, 1998) 
and multidimensional scaling was performed on normalized data (mean equals zero, 
standard deviation equals one). Functional characterization of genes was based on ' 
Gene Ontology (The Gene Ontology Consortium, 2000) and other published works 
known to those of ordinary skill in the art. 

The quality of a set of selected gene features to be used as potential 
markers was measured by estimating the probability that its observed performance, 
in terms of number of misclassified tissue samples, could occur by chance alone. This 
was achieved by performing a series of Monte Carlo simulations (Davison and 
Hinkley, 1997) upon the expression data of the selected genes. In each simulation, the 
tissues' labels were randomly permuted and the number of misclassifications was 
noted. A total of one million runs of Monte Carlo simulations were performed. The 
reported P-value (denoted as P a ) is the fraction of times the permutations generated as 
few misclassifications as, or fewer than, the original labeling. To determine whether 
the set of genes observed to have a good performance as tumor discriminators, could 
appear merely by chance, different Monte Carlo simulations were carried out. In 
each simulation, an equivalent number of gene features was randomly picked from a 
designated large population of features, and the performance of the random gene set 
was evaluated by the number of tissue samples that were misclassified. A total of 
10,000 runs of Monte Carlo simulations were performed for each evaluation. The P- 
value (P b ) is the fraction of times the random gene set performed as good as, or better 
man, the performance of the selected gene set. 

The significance of the number of observed overlapping genes after 
intersection of the important gene lists, derived as described herein, with gene lists 
reported previously was approximated by measuring the probabilities that such 
overlap could occur by chance alone. A separate series of Monte Carlo simulations 
were employed to estimate the P-values of the two-group comparisons. In each 
simulation, two fists of genes corresponding to the two groups were generated. Each 
list was constructed by randomly selecting genes, as many as the number of genes in 
its corresponding group, from the entire collection of genes of its 
respective microarray gene set. The two random gene fists were then intersected. 
The P-value (P c ) of the comparison was obtained by generating and intersecting the 
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two random lists 100 million times, and reported as the fraction of times the random 
overlap is equal or greater than the observed one. To validate the utility of the various 
expression cassettes to distinguish HCC tumor from non-tumor liver, the prediction 
accuracy of each discriminator cassette was assessed on an independent tissue set 
comprising of 58 liver clinical biopsies from 29 patients using a ^-Nearest Neighbor 
(ANN) classification algorithm (£=3) using Pearson correlation to measure the 
similarity between expression profiles. The algorithm was trained against the dataset 
comprising 74 tissue samples from 37 patients before testing against the new tissue 



Real-time semi-quantitative RT-PCR 

Total RNA from individual tissue samples were analyzed for the expression 
levels of selected genes by real-time semi-quantitative RT-PCR using the LightCycler 
RNA amplification kit SYBR Green I on the LightCycler (Roche, Basel, Switzerland) 
according to the manufacturer's instructions. Briefly, one-step RT-PCR reactions 
consisted of an initial incubation at 55°C for 10 min, followed by a denaturation step 
at 95°C for 30 s, and amplification for 40 cycles of 1 s at 95°C, 10 s at 57°C, and 1 3 s 
at 72°C. For each reaction, 10 ng of total RNA was analyzed. The gene specific 
primers designed were, for example, as follows: IGFBP3 5'- 

ATAATCATCATCAAGAAAGGGCAT-3' and 5 '-GAAGGGCGAC ACTGCTT- 3'; 
EGFR 5'-GCGTCTCTTGCCGGAATG-3 ' and 5 '-GGCTCACCCTCCAGAAGCTT- 
3'; ERBB2 5 '-GGATGTGCGGCTCGTACAC-3 ' and 5'- 
TAATTTTGACATGGTTGGGACTCTT- 3'; ERBB3 5'- 
CGGTTATGTCATGCCAGATACAC-3 ' and 5'- 
ACAGAACTGAGACCCACTGAAGAA-3'; PBGD 5'- 

GAGTGATTCGCGTGGGTACC- 3' and 5 '-GGCTCCGATGGTGAAGCC-3 ' . The 
relative expression level of each gene of interest in individual tissue sample 
was normalized against that of the "housekeeping" gene PBGD. Data are presented as 
the level of gene expression in each HCC tumor relative to its corresponding non- 
tumor liver specimen. 
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Assessment Of Global Gene Expression D ifferences Between HCG Timers a„h 
Non-Tumor Liver Sp ecimens 

The gene expression patterns of primary HCC tumors and the 
corresponding non-tumor liver tissues from 37 patients were examined by cDNA 
microarray. Amplified RNA prepared from each experimental sample was labeled 
with Cy5 and hybridized on the array with pooled human 'common 
reference' amplified RNA labeled with Cy3. Reciprocal dye swap replicate 
hybridizations were performed to minimize technical noise. Since the overall 
correlation of reciprocal labeling was good, values obtained from reciprocal 
labeling experiments were averaged and used in subsequent analyses. Firstly, the 
overall natural patterns of gene expression in the HCC tumor and non-tumor liver 
tissues were assessed based on unsupervised hierarchical clustering. Analysis of 
variance in expression levels for each gene across all the tissues indicated that 500 
gene features (containing 493 unique UniGenes) showed the largest variability across 
both HCC tumor and non-tumor liver tissues (Figure 1). Included in this list are AFP, 
an often used prognostic marker for HCC, and other genes associated with HCC such 
as HGF, MYC, and a ras family member RAN. Hierarchical clustering analysis based 
on these highly variant genes (derived from the 37 pairs of HCC tumor and non-tumor 
liver samples and using the 500 most variable gene features) separated the tissues 
into two main clusters, one representing the HCC tumors and the other, the non-tumor 
liver tissues with only six of 37 HCC tumors misclassified as non-tumors. Thus, the 
molecular configuration of HCC can be readily distinguished from that of non-tumor 
liver with minimal data manipulation. 

Next, to investigate differential gene expression patterns between HCC tumors 
and non-tumor livers, the Wilcoxon rank-sum test was used and the top 2.5% 
candidate genes which displayed the smallest (best) P-value scores (P^xlO" 6 ) and at 
least 1.5-fold change in gene expression were identified, resulting in a list of 218 
genes (Table 1). For these 218 genes, false discovery rate analysis indicated a false- 
positive error of less than 0.4%. Multidimensional scaling analysis based on these 
outlier genes indicated that the HCC tumors were a more heterogeneous population 
than the non-tumor liver tissues (Figure 2). 
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Cancer cell lines derived from the primary tumor have traditionally been used 
as in vitro model systems for investigating the function of genes in the in vivo tumor 
environment Using the 218 differentially expressed outlier genes identified in the 
clinical samples, the expression pattern of the same genes in 14 established human 
liver cancer cell lines was analysed. These cell lines exhibited gene expression 
profiles that were different from the clinical HCC tumor tissues (Figure 2), suggesting 
that they may have accumulated additional genetic or epigenetic alterations in culture 
and are not entirely reflective of the primary tumor biology. 

Identification Of Gene Clusters Diffe rentially Repressed In HC C Tumors 

Among the statistically significant 218 genes that distinguished HCC 
tumors from non-tumor liver tissue specimens, more genes were observed to be 
overexpressed than under-expressed in the malignant tissue specimens relative to the 
non-tumor tissue specimens. Mapping of the chromosomal location of these 2 1 8 
unique outlier genes indicated that a disproportionate number of genes was located on 
chromosome 1 (Figure 3A), particularly in the lq region, and that majority of these 
genes were more highly expressed in the tumor tissues. Further characterization of 
these outlier genes revealed that a substantial proportion of genes was involved in 
transport (e.g., PEA 15), RNA processing (e.g., RDBP), and metabolic processes (e.g., 
NME1) and showed increased expression in HCC tumor specimens, possibly 
indicating accelerated rates of metabolism (Figure 3B, Table 1). Several outlier genes 
(e.g., SMT3H1) are members of the ubiquitin-proteasome pathway, suggesting 
deregulation of this pathway in HCC. A gene cluster associated with lymphocyte 
infiltrate that included the expression of genes such as IGKC and IGJ was 
observed, and transcription factors (e.g., ESR1) and genes involved in controlling 
growth and differentiation (e.g., GRN) , and signal transduction (e.g., CSTB) formed 
the other dominant gene groups. Notably, the polycomb group protein BMI1 was 
consistently expressed at much higher levels in HCC tumor specimens (Figure 4). 
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Table 1. Genes significantly differentially expressed between HCC tumor and 
non-tumor liver tissues. 



Function 


Gene 
Symbol 


Gene Name 


UniGene 
Identifier 


Expres- 
sion 
change 
in HCC 
tumor* 


GenBank 
No. 


transcription 
factors 


ILF2 


interleukin enhancer 
bindina factor 2 4SkD 


Hs.75117 


? 


AA307289 




BMI1 


murine leukemia viral (bmi- 
1) oncogene homoloq 


Hs.431 


? 


AA884913 




TAF9 


TAF9 RNA polymerase II, 
TATA box binding protein 
(TBP)-associated factor, 32 
kD 


Hs.60679 


? 


U21858 




RFX5 


regulatory factor X, 5 
(influences HLA class II 
expression) 


Hs.1 66891 


? 


AL050135 




SSRP1 
ZNF146 


structure specific recognition 
protein 1 

zinc finaer protein 146 


Hs.79162 


? 


AI635077 




SREBF2 


sterol regulatory element 
bindina transcription factor 2 


Hs.301819 
Hs.108689 


? 
? 


X70394 
AA608556 




MAFG 


v-maf musculoaponeurotic 
fibrosarcoma (avian) 
oncogene farnilv, protein G 


Hs.252229 


? 


AF059195 




CHD4 


chromodomain helicase 
DNA binding protein 4 


Hs.74441 


? 


BE408958 




KID /I A «fl 

NK4A1 
ESR1 


nuclear receptor subfamilv 
4, group A, member 1 
estrogen receptor 1 1 


no. I 1 |g 

Hs.1657 


7 
? 


NM 00213 
5 

AL078582 




ZIMF238 
FOSB 1 


zinc finger protein 238 
FBJ murine osteosarcoma 
viral oncogene homolog B 


Hs.69997 
Hs.75678 


? 
? 


AJ223321 
L49169 




ID1 


inhibitor of DNA bindina 1 
dominant negative helix- 
loop-helix protein 


He 7KAOA 
n5. f 04^14 I 


o 

f 


S78825 




FOS 


v-fos FBJ murine 
osteosarcoma viral 
oncogene homoloq 


Hs.25647 


? 


V01512 


RNA 

processing 


H2AFY 


H2A histone family, 
member Y 


Hs.75258 


? 


AA307460 




SNRPB 
RPS7 


small nuclear 
ribonucleoprotein 
polypeptides B and B1 
ribosomal protein S7 


Hs.83753 


? 


BE252108 




MRPS14 


mitochondrial ribosomal 
protein S14 


Hs.301547 
Hs.247324 


? 
? 


AA315872 I 
AW973521 




HNRPU 
SNRPD2 


heterogeneous nuclear 
ribonucleoprotein U 
(scaffold attachment factor 
A) 

small nuclear 


Hs.103804 
Hs.53125 


? 
? 


X65488 
AA315774 
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NCL 



ribonucleoprotein D2 
3lypep«deHfr5km 
nucteolln 



RPS10 | ribosomat protein S10~ 



Hs.79110 



RPL6 
SFPQ 



ribosomal protein L6 
splicing factor 
proline/glutamine rich 
(polypyrimidine tract-binding 
irotein-associated) 
similar to S. pombe dim1+ 



Hs.76230 



MARS 
SFRS9 



methionine-tRNA 
synthetase 
splicing factor, 
arginlne/serine-rich 9 



Hs.349961 
Hs/1 80610 



Hs.5074 



Hs.279946 
Hs.77608 



? 



? 



AK000250 



AW245775 



AW67543 0 
X70944 



AI814618 



BE299937 
AL021546 



RBM3 RNA binding motif protein 3 Hs - 3 <>™04 



NM_00674 
3 



U2AF65 



U2 small nuclear 
ribonucleoprotein auxiliary 
factor f65kP) 



Hs.7655 



splicing factor, 
SFRS1 I ar 9 |nir »e/serine-rich 1 

(splicing factor 2, alternate 
splicing factor) 



Hs.73737 



SNRPE 



small nuclear 
ribonucleoprotein 
polypeptide E 



Hs.334612 



SF3B4 I s P ,icin 9 factor 3b, subunit 4, 
49kD 



Hs.25797 



RDBP RD RNA-bindinq protein I Hs.1060frT 



SNRPF 



small nuclear 
ribonucleoprotein 
polypeptide F 



Hs.105465 



RRM1 ribonucleotide reductase M1 Hs.2934 



polypeptide 



RPL38 ribosomal protein L38 



HNRPH1 



U5- 
116KP 
RPLP1 



heterogeneous nuclear 
ribonucleoprotein H1 (H) 



Hs.2017 



Hs.245710 



U5 snRNP-specific protein, 
116 kD 



Hs.151787 



ribosomal protein. larp e r P1 

OXA1L I oxidase (cytochrome c) 
assembly 1-like 



Hs.1 77592 
Hs.151134 



PNA 

replication/ 
repair 



APPRT 



APP-ribosyltransferase 
(NAP+; poly (ADP-ribose) 
polymerase) 



Hs.1 77766 



PRKPC 



protein kinase, PNA- 
activated, catalytic 
polypeptide 



Hs.1 55637 



SMC4 (structural 
SMC4L1 I maln tenance of 

chromosomes 4, yeast)-like 
J 

histone H2A.F/Z vaiianf 
flap structure-specific 
endonuclease 1 



Hs.50758 



Hs,3010Q5 
Hs.4756 



35 



7_ 
? 



_7_ 

? 



AA936430 



M72709 



X12466 



NM_G0585 



X16105 



AA649986 



X59543 



AI832988 



BE296051 



P21163 



AW963733 
X80695 



M18112 



U34994 



AB019987 



BE409809 
BE278623 
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MCM2 



HAT1 
RAD50 



CBX1 



CSPG6 



FUS 



UNG2 



minichromosome 
maintenance deficient (S. 
cerevisiae) 2 (mitotin) 



Hs.57101 



histone acetyltransferase 1 
RAD50 (S. cerevisiae) 
homoloq 



Hs.13340 
Hs.41587 



chromobox homolog 1 (HP1 
beta homoloq Drosophila ) 



Hs.77254 



chondroitin sulfate 
proteoglycan 6 (bamacan) 



Hs.24485 



fusion, derived from t(12;16) 
malignant liposarcoma 



Hs.99969 



uracil-DNA glycosvlase 2 I Hs.3Q4l" 



1_ 
? 



BE250461 



AL046741 



NM_00544 
5 



BE396632 



cell cycle/ 
growth/ 
differentia- 
tion 



Hs.1 19651 



GPC3 



glypican 3 



CDKN2A 



cyclin-dependent kinase 
inhibitor 2A (melanoma, 
p16, inhibits ; 



Hs.1174 



CDK4> 



MOK 



midkine (neurite growth- 
promoting factor 2) 



Hs.82045 



NTRK1 
CCNE2 

HDGF 



neurotrophic tyrosine j Hs.85844 
kinase, receptor, type 1 

cyclin E2 | Hs.30484~ 

hepatoma-derived growth | Hs.8952(T 
factor (high-mobility group 
protein ' 



1-like) 



TP53BP2 



tumor protein p53-binding 
protein, 2 



Hs.44585 



CDC23 



GRN 



CDC23 (cell division cycle 
23, yeast, homoloq) 



Hs.1 53546 



GHR 



granulin 



Hs.180577 



growth hormone receptor I Hs.i2Si«n 



IGFBP3 



CYR61 



HGF 



insulin-like growth factor 
binding protein 3 



Hs.77326 



cysteine-rich, angiogenic 
inducer. 61 



Hs.8867 



hepatocyte growth factor 
(hepapoietin A; scatter 
factor) 



Hs.809 



? 

T 
T 



U50410 



AI859822 



AA427949 



AA075110 



Al 12391 6 



AF053977 



AI375908 



X06562 



BE336944 



Y12084 



X16323 



apoptosis I DAP3 



PDCD5 



death associated protein 3 I Hs.159677 



immune 
response 



programmed cell death 5 I Hs.l664fi« 



PPIA 



TMPO 



peptidylprolyl isomerase A 
(cyclophilinA) 



Hs.342389 



thymopoietin 



PPIB 
CD5L 



Hs.1 1355 



peptidylprolyl isomerase B | Hs.699 

'cyclophilin B) 

CD5 antigen-like (scavenger | Hs.52002^ 
receptor cysteine rich 
family) 



SCYA14 



small inducible cytokine 
subfamily A (Cys-Cvs) , 



Hs.20144 



36 



? 

T 



AA207194 



AA452724 



AW732921 



U09087 



BE386706 

NM_00589 
4 



NM_00416 
6 
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SDF1 
C7 


member 14 

stromal ceii-denved factor 1 
immunoglobulin heavy 
constant gamma 3 (G3m 
marker) 


Hs.237356 
Hs.300697 


? 
? 


L36033 
D78345 




IGJ 


complement component 7 
immunoglobulin J 
polypeptide, linker protein 
for immunoglobulin alpha 
and mu polypeptides 


Hs.78065 
Hs.76325 


? 
? 


X86328 
AW1 72754 




IGKC 


immunoglobulin kappa 
constant 


Hs.156110 


? 


AW404507 


cell 

QUI K70IUI 1/ 

cytoskeletal 
organization 


LBR 


lam In B receptor 


Hs.152931 


? 


L25931 




ITGB1 


integrin, beta 1 (fibronectin 
receptor, beta polypeptide, 
antigen CD29 includes 
MDF2, MSK12) 


Hs.287797 


? 


W38716 




LAMR1 


laminin receptor 1 (67kD, 
ribosomal protein SA) 


Hs.181357 


? 


AW328280 




CAPZA2 


capping protein (actin 
filament) muscle Z-line, 
alpha 2 


Hs 75546 


o 

f 


UUoool 




ICAP-1A 


integrin cytoplasmic domain- 
associated protein 1 


Hs.1 73274 


? 


AF012023 




DNCH1 


aynein, cytoplasmic, heavy 
polypeptide 1 


Hs.7720 


? 


AB002323 




ARPC1A 
DPT 


actin related protein 2/3 
complex, subunit 1 A (41 kD) 
dermatopontin 


Hs.90370 


? 






MMP15 


indirix metauoprotemase 15 
(membrane-inserted) 


Hs.80552 
Hs.80343 


? 
? 


AW016451 
D85510 




ARHE 


ras nomoiog gene family, 
member E 


Hs.6838 


? 


W03441 


signal ~~ 
transduction 


CAP2 
CSTB 


adenylyl cyclase-associated 


Hs.296341 


? 


AW779995 




ARMET 


cystatin B fstefin B) 
arginine-ncn, mutated in 
early staae tumors 


Hs.695 
Hs.75412 


? 
? 


AI831499 
AA582041 




EFNA1 


ephrin-A1 


Hs.1624 


? 


NM_00442 
a 




PPP2R5A 


protein phosphatase 2, 
regulatory subunit B (B56), 
alpha isoform 


Hs.1 55079 


? 


o . 

AA234460 




RAN 


RAN, member RAS 
oncogene family 


Hs.10842 


? 


NM_00632 

c 




CALM2 
LASP1 


calmodulin 2 
phosphorylase kinase, 
delta) 

LIM and SH3 protein 1 


Hs.182278 


? 


D45887 




SHC1 
RGS5 


SHC (Src homology 2 
domain-containing) 
transforming protein 1 
regulator of G-protein 


Hs.334851 
Hs.81972 

Hs.24950 


? 
? 

? 


AI304506 
X68148 

AI674877 
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HAX1 
GABRE 


siqnallinq 5 
noi oinaing protein 
gamma-am inobutyric acid 
(GABA) A receptor, epsilon 


Hs.15318 
Hs.22785 


? 
? 


BE260953 
NM 00496 
1 




ARFGEF 
2 


ADP-nbosylation factor 
guanine nucleotide- 
exchange factor 2 (brefeldin 
A-inniDiteaj 


Hs.1 18249 


? 


AA099S82 




MAPK6 


mitogen-activated protein 
Kinase o 


Hs.271980 


? 


NM 00274 
8 






guanine nucleotide binding 
protein (vj protein), beta 
polypeptide 2-like 1 


Hs.5662 


7 


BE206815 




CRDD4 


v-erb-b2 avian erythroblastic 
leukemia viral oncogene 
homoloq 3 


Hs.199067 


? 


AI565773 




DSCR1 


Down syndrome critical 
region gene 1 


Hs. 184222 


? 


U85267 




CRHBP 


corticotropin releasing 
hormone-bindinq protein 


Hs.115617 


? 


NM 00188 
2 




STK39 


serine threonine kinase 39 
(STE20/SPS1 homoiog, 
yeast) 


Hs.1 99263 


? 


F26137 


uDiquiun- 

proteasome 

pathway 


UBD 


diubiquitin 


Hs.44532 


? 


NM 00639 
8 




PSMB4 


proteasome (prosome, 
macropain) subunit, beta 
type. 4 


Hs.89545 


? 


BE336637 




SSA2 


Sjogren syndrome antigen 
A2 (60kD, ribonucleoprotein 
autoantiaen SS-A/Ro) 


Hs.554 


? 


NM 00460 
0 




USP14 


ubiquitin specific protease 
14 (tRNA-guanine 
trans glycosylase) 


Hs.75981 


? 


NM 00515 
1 




PSMA1 


proteasome (prosome, 
macropain) subunit, alpha 
type, 1 


Hs.82159 


? 


AI889267 




EIF3S9 


eukaryotic translation 
initiation factor 3, subunit 9 

Sato -1 A CLn\ 
(eia, 1 lOKUJ 


Hs.57783 


? 


U62583 




SMT3H1 


SMT3 (suppressor of mif 
two 3, yeast) homoloq 1 


Hs.85119 


? 


AA1 60893 




UBE2D2 


ubiquitin-conjugating 
enzyme E2D 2 (homologous 
to yeast UBC4/5) 


Hs.1 08332 


? 


NM 00333 
9 




PSMD11 


proteasome (prosome, 
macropain) 26S subunit, 
non-ATPase, 11 


Hs.90744 


? 


AB003102 




PSMB3 


proteasome (prosome, 
macropain) subunit, beta 
type, 3 


Hs.82793 


? 


AI028114 ; 




PSMD4 


proteasome (prosome, 
macropain) 26S subunit. 
noivATPase. 4 


Hs.148495 


? 


AA604027 


molecular 


CCT5 


chaperonin containlnq 


Hs.1 600 


7 


D43950 
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chaperone 


CCT3 


TCP1. subunit 5 feDsilon) 
chaperonin containing 
TCP1. subunlt 3 (qamma) 


Hs.1708 


? 


BE302501 




HSPA5 


heat shock 70kD protein 5 
(glucose-regulated protein, 
78kD) 


Hs.75410 


j ? 


AL043206 




CCT4 
HSPA4 


chaperonin containing 
TCP1 , subunit 4 (delta) 
heat shock 70kD protein 4 


Hs.79150 


? 


U38846 




CCT7 
HSPA8 


chaperonin containing 
TCP1. subunit 7 (eta) 

heat «?hoHc 7flkD rtrntoln ft 


Hs.90093 
Hs.108809 

1 1 _ A f\f\ A A A 


? 
? 


AB023420 
AA314436 


transport 


CCT6A 

ANXA2 
PDZK1 


chaperonin containing 

TCP1 siihimit RA / 7 Ato -f \ 

annexin A2 


Hs. 18041 4 
Hs.82916 

Hs.217493 


? 
? 

? 


AW249010 
L27706 

BE293414 




SYPL 
TIMM17A 


PDZ domain containing 1 
synaptoohysin-like protein 
translocase of inner 
mitochondrial membrane 17 

Muinoiocj m (yeast J 


Hs.15456 
Hs.80919 
Hs.20716 


? 
f 

m 

? 


AF012281 

O / A*KJ 1 

AW247564 




XP01 


exportin 1 (CRM1, yeast, 


Hs.79090 


? 


D89729 




HMGM4 
NUCB2 


high mobility group 
nucieosomai omding domain 
4 

nucleobindin 2 


Hs.236774 


? 


U90549 




UGTRFI 

V/VJ 1 i\uL 

1 


uuK-gaiactose transporter 
related 


Hs.3164 
Hs.1 54073 


? 
? 


AW951523 
AW1 92554 




PEA15 


phosphoprotein enriched in 
abirocyies TO 


Hs.1 94673 


? 


Y13736 




CLTA 


clathrin, light polypeptide 
iLca) 


Hs.104143 


? 


AW974204 




ATP6IP1 


ATPase, H+ transporting, 
lysosomal interacting protein 


Hs.6551 


? 


NM 00118 
3 




SSR2 


oignai sequence receptor, 
beta (translocon-associated 
oroteln hpfc^ 


Hs.74564 


? 


BE313059 




AP3S1 


adaptor-related protein 
complex 3, stoma 1 subunit 


Hs.80917 


? 


D63643 




VDAC2 


vuiiage-aepenaent anion 
channel 2 


Hs.78902 


? 


AI015604 




VPS45A 


vciuuuidr protein sorting 45 A 
(yeast) 


Hs.6650 


? 


AA702845 




VCP 


valosin-containing protein 


Hs.106357 


? 


NM 00712 
6 




SACM2L 


SAC2 (suppressor of actin 
mutations 2, yeast, 
homotog)-like 


Hs.1 69407 


? 


AK001725 




KPNB1 


karyopherin (importin) beta 
1 


Hs.180446 


? 


L38951 




SLC21A3 


solute carrier family 21 
(organic anion transporter), 
member 3 


Hs.46440 


? j 


U21943 




SLC22A1 


solute carrier family 22 
(organic cation transporter). 


Hs.1 17367 


? 


X98332 
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HSPA5 


I member 1 
Heat shock 70kD protein 5 

■ { f%\t fr^CA^^Ont lining _ _ _X— I 

i (giucose-reguiated protein, 
78kD) 


Hs. 75410 


? 




metabolism 


GNPAT 


1 glyceronephosphate O- 
acyltransferase 


Hs.12482 


? 


AF043937 




NME2 


non-metastatic cells 2, 
protein (NM23B) expressed 
1 in 


Hs.275163 


? 


L16785 




NME1 


non-metastatic cells 1, 
■ \ji uicii i \iNivi^ioM/ expressed 
in 


Hs.1 18638 


r ? 


AA1 47871 




UQCRH 
TALDOI 


I ubiquinol-cytochrome c 
reductase hinqe protein 
transaldolase 1 


Hs.73818 
I Hs.77290 


I ? 


AI093521 




P5CR2 


pyrroline 6-carboxyiate 
reductase isoform 


] Hs.274287 


I ? 
? 


I AFmnAnn 
AI161110 




GFPT1 


I glutamine-fructose-6- 
pnospnaie transaminase 1 


Hs.1674 


? 


NM 00205 
6 




DPMI 
ACLY 


I dolichyl-phosphate 
mannosyltransferase 
poiypepuae i , catalytic 
subunit 

ATP citrate lyase 


Hs.5085 


? 


AW1 73486 




B4GALT3 


UDP-Gal:betaGlcNAc beta 
1,4- galactosyltransferase, 
polypeptide 3 


Hs.1 74140 
Hs.321231 


? 
? 


AW967351 
Y12509 




GCN1L1 


GCN1 (general control of i 
amino-acid synthesis 1, 
yeastWike 1 


Hs.75354 


? 


D86973 




DPAGT1 


dolichyl-phosphate (UDP-N- 
acetylglucosamine) N- 
acetylglucosaminephosphotr 
ansferase 1 (GlcNAc-1-P 

fronQforaeo^ S 
uaiioicidouj | 


Hs.26433 


? 


Z82022 




ACAA1 


acetyl-Coenzyme A 1 
acyltransferase 1 j 
\pciuAi^uinoi o-oxoacyi- i 
Coenzyme A thiolase) I 


Hs.1 661 60 


? | 


NM 00160 
7 




ALDH8A1 I 


aldehyde dehydrogenase 8 
laiiiuy, rnemDer ai 


Hs.1 8443 


? 


AI051566 




SRD5A2 


steroid-5-a!pha-reductase, | 
aiMiia poiypepuue z 


Hs.1 989 


? 


M74047 




NAT2 


N-acetyltransferase 2 
(arylamine IM- 
acetyltransferase) 


Hs.2 


? 


D90040 




GSTZ1 I 


glutathione transferase zeta f 
1 (maleyiacetoacetate 
isomerase) I 


Hs.26403 j 


? 


U86529 




ADH1B I 


alcohol dehydrogenase 1B j 
(class I), beta polypeptide 


Hs.4 


? 


M24317 




CYP2C8 


cytochrome P450, subfamily 
IIC (mephenytoin 4- 
hydroxylase). polypeptide 8 


Hs.174220 


? 


M17398 
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Unknown 



CYP2E 



ECT2 



DKFZP56 
4B167 

KIAA0016 

DXS1357 
E 



cytochrome P450, subfamily 
HE (ethanoHnducible) 



ESTs, Highly similar to 
H33__HUMAN HISTONE 
H3.3 fH.sap fens] 



epithelial cell transforming 
sequence 2 oncogene 



Hs.75183 



Hs.349754 



DKFZP564B167 protein 

translocase of outer 
mitochondrial membrane 20 

it) homoloc 
accessory proteins 
BAP31/BAP29 



C20orf24 



FU10326 



KIAA0117 



DEK 



PODXL 



DSS1 



KIAA0475 gene product 



chromosome 20 open 
reading frame 24 



hypothetical protein 
FLJ10326 



KIAA01 17 protein 



Unknown 

DEK oncogene (DNA 
binding) 



podocalvxin-like 



Deleted in split-hand/split- 
foot 1 region 



Hs.1 22579 
Hs.76285 



Hs.75187 



Hs.291904 



Hs.5737 



Hs.1 84062 



Hs.262823 



Hs.322478 



Hs.1 10713 



Hs.1 6426 



Hs.333495 



? 

T 



? 

T 



J02843 



AA313375 



AL1 37710 



AI032331 
D13641 



AA524523 



AI340141 



AA665998 



AL1 33010 



NMJD0221 

J 

AI888504 



BE395330 



W79057 



PRQ1855 



hypothetical protein 
PRQ1855 



Hs.283558 



KIAA0470 



Homo sapiens mRNA; 
cDNA DKFZp434l052 (from 
clone PKFZp434IQ52) 



KIAA0470 gene product 



MYLE 



MYLE protein 



Homo sapiens cDNA 
FLJ14232 fis, clone 
NT2RP4000035 



MAGED2 



melanoma antigen, family D, 



FLJ12806 



hypothetical protein 
FLJ12806 



YWHAB 



tyrosine 3- 

monooxygenase/tryptophan 
5-monooxygenase 
activation protein, beta 
polypeptide 



LOC5123 



Unknown 



hypothetical protein 



KIAA0592 



C1orf9 



KIAA0592 protein 



KIAA0788 



MGC1955 



KIAA0205 fliene product 



chromosome 1 open 
reading frame 9 



KIAA0788 protein 



hypothetical protein 
41 



Hs.378917 



Hs.25132 



Hs.1 1902 



Hs.101810 



Hs.4943 



Hs.1 07637 



Hs.279920 



Hs.1 81 444 



Hs.1 3273 



Hs.3610 



Hs.1 08636 



Hs.246112 



Hs.334787 



? 

T 



? 
? 



? 



AI379021 



AA425759 



NM_01481 
2 



AA628977 



AI675122 



Z98046 



BE044682 



AL008725 



AL031775 
AI1 90653 

AL080183 



D86960 



BE466870 
AB018331 
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6 



KIAA0731 



MGC19556 



KIAA0731 protein 



Hs.6214 



C7orf14 
D123 



chromosome 7 open 
reading frame 14 



Hs.84790 



D123 gene product 



C5orf8 



chromosome 5 open 
reading frame 8 



Hs.82043 



Hs.75864 



D86978 



U27112 



BE254013 



AD24 



Homo sapiens cDNA: 
FLJ23020 fis, clone 
LNG00943 
AD24 protein 



Hs.6127 



WHIP 



Werner helicase interacting 
protein 



Hs.74899 



Hs.236828 



AA054768 



AI017605 



AA481600 



BC-2 



DKFZP54 
7E101 



putative breast 
adenocarcinoma marker 
(32kD) 



Hs.12107 



DKFZP547E1010 protein 



Hs.323817 



FLJ22251 



hypothetical protein 
FLJ22251 



Hs.289064 



ESTs 



KIAA0187 



KIAA0187 gene product 



Hs.89267 



Hs.10848 



? 

T 



AF042384 



NM_01560 
7 



AA595663 



D80009 



MPV17 



MAWBP 



MpV17 transgene, murine 
homolog. glomerulosclerosis 



Hs.75659 



MAWD binding protein 



Homo sapiens cDNA 
FU37464 fis, clone 
BRAWH201 1795. weakly 
similar to LIVER 
CARBOXYLESTERASE 
PRECURSOR (EC 3.1.1.1) 



Hs.16341 



Hs.346947 



Homo sapiens SNC73 
protein (SNC73) mRNA, 
complete cds 



Hs.293441 



ESTs, Highly similar to 
SMHU1B metallothioneln 1B 
[H.sapiensl 



Hs.36102 



Homo sapiens unknown 
mRNA 



Hs.367982 



FLJ12666 



hypothetical protein 
FLJ12666 



Hs.23767 



RNAHP 



RNA helicase-related 
protein 



Hs.8765 



'gene expression level showing at least 1.6-fold change ii i 
liver tissues (P<1 x1 Or) 



NM_00243 
7 



AI866254 



N44535 



AA290845 



R99207 



H72532 



AW952494 



AI814448 



HCC tumors relative to non-tumor 



Real-time RT-PCR analysis was performed on a panel of genes, including 
IGFBP3 and ERBB3 in all the 37 matched HCC tumor and non-tumor liver samples. 
Expression of a known "housekeeping" gene porphobilinogen deaminase (PBGD) 
(Fink et al, 1998) was used as normalizing control. The results of real-time RT-PCR 
analyses of IGFBP3 and ERBB3 indicated that IGFBP3 expression was diminished in 



42 



WO 2004/108964 



PCT/SG2004/000166 



35 of 37 HCC tumors relative to their corresponding non-tumor liver tissues (Figure 
SA), while ERBB3 expression was elevated in 34 of 37 tumor samples (Figures 5B). 
Since ERBB3 is defective in tyrosine kinase activity and requires dimerization with 
other receptors, possibly another member of the ERBB family (Riese and Stern, 
1998), the hypothesis that HCC tumors expressing high levels of ERBB3 were 
associated with high expression of ERRB2 or EGFR was tested. The expression of 
ERBB2 was elevated in 12 of 37 tumors (Figure 5C), while high EGFR expression 
was found in 15 of 37 tumors (Figure 5D). A significant concomitant increase in 
ERBB2 expression (t-test P-value -0.0026), but no association with high EGFR 
expression (t-test P-value -0.31) was found in the top fifty percentile of high ERBB3- 
expressing HCC tumors, indicating that the cognate partners of ERRB3 appeared to 
be present in those tumors expressing high levels of ERBB3. Real-time and semi- 
quantitative RT-PCR analyses were also conducted on a panel of genes identified as 
differentially expressed in HCC (Figures 6-21). 

Validation Of HCC Tumor Discriminato r Expression Ca^H^ 

Changes in gene expression of HCC using microarray technology have been 
reported (Chen et al, 2002, Okabe et al, 2001 ; Honda et al, 2001 ; Shirota et al, 2001 ; 
Tackels-Horne et al, 2001; Xu et al, 2001a; Xu et al, 2001b). The intersection of the 
important gene lists, derived as described herein, with gene lists reported previously 
was explored, and resulted in the identification of additional gene lists or "expression 
cassettes" (Tables 2-4) that were capable of distmguishing HCC tumor from non- 
tumor liver tissues. 

In the first gene list, a total of 265 features, containing 245 unique UniGenes 
from the microarray used herein were observed to overlap (Table 2). Hierarchical 
clustering analyses based on expression levels of these 265 •overlap' features 
separated the tissue set into two distinct groups of tumor and non-tumor, with five 
tissue samples misclassified. Such clustering was significant (Pa<lxl0 _6 ) based on 
random permutation testing of sample labels. The likelihood of a randomly chosen set 
of 265 features producing five or fewer samples misclassified was low (P b =1.5xl0~ 3 ). 
Thus, these 265 'overlap' features could distinguish HCC tumor from non-tumor liver 
with reasonable precision, and the features were unlikely to appear by chance. Among 
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these genes were smaller subgroups characterized by distinct gene expression 
signatures involving potential different pathways. A cholesterol biosynthetic pathway 
was characterized by higher expression in HCC tumors for genes, of the enzymes 
SQLE, ACLY and FDPS. A subgroup involved in growth and differentiation was 
characterized in the HCC tumor tissues by lower expression of ESR1, IGFBP3 and 
PDGFRa, and high expression of PPIB1. A subgroup of bZIP transcription factors 
ATF3, FOS, JUN, and MYBL2 was characterized to be down-regulated in the HCC 
tumor tissues. 



Table 2. Intersection of microarray expression dataset with HCC Genes 



UniGene 
Identifier 


Gene 


Description 


GenBank No. 


Hq 10140ft 


DV^MI £. 


branched chain aminotransferase 9 
mitochondrial 


dc^o4Zo5, 
AA436410 






branched chain aminotransferase 9 
mitochondrial 


INM UU1190, 
AA436410 


Hs 102664 




vesicle-associated membrane protein 4 


ai nicooc 
/M-Uoo^yo, 

AA424813 i 


Hs 10319 

« • w • | \J\J 1 & 




UDP glycosyltransferase 2 familv 
polypeptide B7 


AA746229 


Hs.10359 




ESTs 


AW316760, 
AA630881 


Hs.106061 


RDBP 


RD RNA-binding protein 


X16105, 
AA056390 


Hs.1 07253 


DKFZP761F241 


hypothetical protein DKFZp761F241 
homo sapiens cDNA: FLJ20925 fis, clone 
ADSE00963 


AW519080, 
R20416 


Hs.108441 


HAAO 


3-hydroxyanthranilate 3,4-dioxygenase 


NM 012205, 
T80846 


Hs.108636 


ciorra 


chromosome 1 open reading frame 9 
CH1 MEMBRANE PROTENIN CM 


BE466870, 
N36176 


Hs.110613 


SMG1 


PI-3-kinase-related kinase SMG-1 
KIAA0220 KIAA0220 Drotein 


AB007881, 
R97225 


Hs.11314 


DKFZP564N13 
6 


DKF2P564N1363 protein 


AI360105, 
T87343 


Hs.115617 


CRHBP 


corticotropin releasing hormone-binding 
protein " 


NM 001882, 
AA286752 


Hs.1 18087 


KIAA0610 


KIAA0610 protein 


AB011182, 
N38860 


Hs. 11 8638 


NME1 


non-metastatic cells 1, protein (NM23A) 
expressed in 


AA1 47871, 
AA644092 


Hs.1 18666 


PP591 


hypothetical protein PP591 

human clone 23759 mRNA. partial cds 


U79241, 
AA626336 


Hs.1 19651 


GPC3 


glypican 3 


U50410, 
AA775872 


Hs.12451 
Hs.12482 


EMAPL 

GNPAT ! 


echinoderm microtubule-assoclated 
protein-like j 
glyceronephosphate O-acvltransferasR 


NM 004434, 

AA447196 

AF043937, 
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Hs.125180 
Hs 125359 


" GHR " ™ 
THY1 


arowth hormone receptor 
Thy-1 cell surface antigen 


AA486845 
X06562. N70358 
N94350, 
AI346653 

AA428836 


Hs.1265 


BCKDHB 


Drancnea cnain keto acid dehydrogenase 
E1, beta polypeptide (maple syrup urine 
uiseasej 


NMJQ00056, 
AA4277^Q 


Hs.1279 


C1R 


complement component 1 , r 
subcomponent 


M14058, 
AA041382 
AF045649, 
T69603 


Hs. 13999 


KIAA0700 


KIAA0700 protein 


AI018400, 
N55167 


Hs.1430 


F11 


coagulation factor XI (plasma 
inromDopiastin antecedent) 


AF045649, 
R88990 


Hs.14453 


ICSBP1 


interferon consensus sequence binding 
proiein i 


AW964220, 
N62269 


Hs.14453 


ICSBP1 


interferon consensus sequence binding 


AA514545, 
N62269 


Hs.144904 


NCOR1 


nuclear receptor co-repressor 1 


AB028970, 
AA085748 


Hs.144904 


NCOR1 


nuclear receptor co-repressor 1 


AA468619, 
AA085748 j 


Hs.145567 


AF038169 


hypothetical protein 


AI694342, 
AA406301 


Hs.145567 


AF038169 


hypothetical protein 


AI963556, 
AA406301 


Hs.146360 


IFITM1 


interferon induced transmembrane protein 
1 (9-27) 


AA428847, 
AA419251 i 


Hs.14838 


FLJ10773 


likely ortholog of mouse NPC derived 
proline rich protein 1 


AA044181, 

R93542, 

AA401264 


Hs.15087 


C1orf16 


chromosome 1 open reading frame 16 

KIAAn9*?n klAAnox^n <-%^n^ . . 
rviAVM/^ou iwvmuzou Qene product 


D87437, 
AA431423 


Hs.151518 


TARBP1 


TAR (HIV) RNA-binding protein 1 


NM 005646, 
N62244 


Hs.15154 


SRPX 


sushi-repeat-containing protein, X 
chromosome 


NM 006307, 
AA448569 


Hs.1531 


EHHADH 


cnuyi-ooenzyme a, nydratase/3- 
hydroxvacyl Coenzyme A dehydrogenase 


L07077, R02373 


Hs.1531 


EHHADH 


cfiuyi-v-raenzyme a, nydratase/3- 
hydroxyacyl Coenzyme A dehydrogenase 


AI800553, 
R02373 


Hs.1 53357 
Hs.1 54890 


PLOD3 
FACL2 


procoiiagen-iysine, ^-oxoglutarate 5- 
dioxygenase 3 


AF046889, 
AA459305 


Hs.1 55079 


PPP2R5A 


iduy-acia-ooenzyme A lipase, long-chain 2 
protein phosphatase 2, regulatory subunit 
B (B56), aloha isoform 


D10040. T73556 

AA234460, 

R59164 


Hs.1 55560 


CANX 


calnexin 


AA203197, 

AA1 26265 


Hs.1 55637 


PRKDC 


protein kinase, DNA-activated, catalytic 
polypeptide 


U34994, R27615 


Hs.1 55956 


NAT1 


N-acetyltransferase 1 (arylamine N- 
acetyltransferase) 


R79401.T67128 


Hs.1 571 48 


MGC13204 


hypothetical protein MGC13204 

Homo sapiens cDNA FLJ1 1883 fis f clone 


BE262748, 
N62451 
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Hs.1 578 


BIRC5 


baculoviral IAP repeat-containing 5 
(survivin) 


AW247335, 
AA460685 


Hs.1 59301 


IL18R1 


interleukin 18 receptor 1 


U43672, 
AA482489 


Hs.1 6031 8 


FXYD1 


rAYU domain-containing ion transport 
regulator 1 (phospholemman) 


AI125364, 
H57136 


Hs.1 60786 


ASS 


argininosuccinate synthetase 


BE393272, 
AA676466 


Hs 16341 


MAWRP 
ivwwv ur 


MAWD binding protein 

fc£>Ts, Weakly similar to predicted using 

Genefinder fC. eleqans] 


R54416 


Hs.16426 


PODXL 


podocalyxin-iike 


BE395330, 1 
N64508 


Hs.1 66891 


RFX5 


regulatory factor X, 5 (influences HLA class 
II expression) 


AL050135, 
AA418045 


Hs.1 67382 


NPR1 


natriuretic peptide receptor A/guanylate 
cyclase A (atrionatriuretic peptide receptor 
A) 


AA598841 


Hs 167*520 


v-r t rZUy 


cytochrome P450, subfamily IIC 

(mephenytoin 4-hydroxylase), polypeptide 
9 


M61857. R89491 


Hs.169517 


ALDH1B1 


aldehyde dehydrogenase 1 family, member 
B1 


M63967, R93550 


Hs.169756 


C1S 


complement component 1, s 
subcomponent 


NM 001734, 

AA055520, 

T62048 


Hs.169907 


GSTA4 


glutathione S-transferase A4 


AF025887. 
AA1 52346 


Hs.1 70001 


EIF2B2 


eukaryotic translation initiation factor 2B, 
subunit 2 fbeta, 39kD) 


AA678061, 
R86304 


Hs.1 70001 


EIF2B2 


eukaryotic translation initiation factor 2B f 
subunit 2 (beta, 39kD) 


AF035280, 
R86304 


Hs.1 701 33 
Hs.1 71 955 


FOXOIA 
TROAP 


forkhead boxX)1 A (rhabdomyosarcoma) 


AF032885, 
AA448277 


Hs.1 72665 


MTHFD1 


trophinin associated protein (tastin) 
methylenetetrahydrofolate dehydrogenase 
(NADP+ dependent), 
methenyltetrahydrofolate cyclohydrolase, 
formyltetrahydrofolate synthetase 


U04810. H94949 

NM 005956, 
H10778 


Hs.1 7371 7 


PPAP2B 


phosphatidic acid phosphatase type 2B 


AI458142, 
T71976 


Hs.1 73880 


II 1 RAP 


inteneuKln 1 receptor accessory protein 


AB006537, 

R35902, 

AA256132 


Hs.1 741 40 


ACLY 


ATP citrate fyase 


AW967351, 
H08547 


Hs.1 74220 


CYP2C8 


cytochrome P450, subfamily IIC 
(mephenytoin 4-hydroxylase), polypeptide 
8 


M17398, N53136 


Hs.1 77592 


RPLP1 


ribosomal protein, large, P1 


AW963733, 
AI732304 


Hs.17767 
Hs.1 7971 8 


KIAA1554 
MYBL2 


KIAA1554 protein 

v-myb avian myeloblastosis viral oncogene 


AI625594, 
AA857573, 
H17860 
X13293. 
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Hs.1 80383 


DUSP6 


homoloa-like 2 

dual specificity phosphatase 6 


AA456878 

AB013382, 
AA630374 


Hs.180919 


ID2 


inniDuor ot uiNA Dinding 2, dominant 
negative helix-toop-helix protein 


AI950041, 
H82442 


Hs.181345 


SAH 


SA (rat hypertension-associated) homolog 


AI632754, 
N73827 


Hs.1 8201 8 


IRAKI 


interleukin-1 receptor-associated kinase 1 


NM 001569, 

AI202323, 

AA683550 


Hs.18212 


DXS9879E 


uiv*\ begmeni on cnromosome X (unique) 
9879 expressed sequence 


W73156, 
AA479062 


Hs.1 82575 


SLC15A2 


solute carrier family 15 (H+/peptide 


S78203, 
AA425352 


Hs.1827 


NGFR 


nerve growth factor receptor (TNFR 
superfamily, member 1 6) 


NM 002507, 
R55303 


Hs.183858 


TIF1 


transcriptional intermediary factor 1 


AF1 19042, 
R38345, 

AA016972 ! 


Hs.1 8443 


ALDH8A1 

IvLmi-Jt l\Jr\ | 


aldehyde dehydrogenase 8 family, member 

r\ I 

ESTs 


AI05156B 
N70701 


Hs.1 84697 




Homo sapiens clone 23785 mRNA 
sequence 


AF035307, 
AA041362, 
AA663440 


Hs.1 8676 


SPRY2 


sprouty (Drosophila) homolog 2 


NMJ005842, 
AA453759 


Hs.1 94660 


CLN3 


ceroid-lipofuscinosis, neuronal 3, juvenile 
(Batten, Spielmever-Voqt disease) 


AW249073, 
W37752 


Hs. 194673 


PEA15 


phosphoprotein enriched in astrocytes 15 


Y13736, 
AA293211 


Hs.1 9554 


C1orf2 


chromosome 1 open reading frame 2 


NM 006589, 
H11464 


HS.1 98282 
Hs.19904 


PLSCR1 
CTH 


phospholipid scramblase 1 


AB006746, 
N25945 


Hs.20144 


SCYA14 


uy^iaunuiiase tcysiainionine gamma-lyase) 
small inducible cytokine subfamily A (Cys- 

mpmhor A A 


S52784. R07167 

NM_004166, 

R96626 


Hs.2030 


THBD 


thrombomodulin 


NM_000361, i 
H59861 . 


Hs.20315 


IFIT1 


Interferon-induced protein with 


NM_001548, 
AA1 57787 


Hs.2128 


DUSP5 


dual specificity phosphatase 5 


NM_004419, 
W65460 


Hs.213289 


LDLR 


low density lipoprotein receptor (familial 
hypercholesterolemia) 


NM 000527, 
AA504461 


Hs.21413 


SLC12A5 


solute carrier family 12, 
(potassium/chloride transporter) member 5 


U79245, j 
AA166885 


Hs.21635 


TUBG1 


tubulin, gamma 1 


NM 001070, i 
T77732 


Hs.2178 


H2BFQ 


H2B histone family, member Q 


BE245642, 
AA010223 


Hs.227656 

Hs.23642 
Hs.237356 


XPR1 

HSU79266 
SDF1 


xenotropic and polytropic retrovirus 
receptor 

protein predicted bv clone 23627 
stromal cell-derived factor 1 


AL1 37583, 
AA453474 
U79266, W95346 
AA442810. 
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Hs.237356 


SDF1 


stromal cell-derived factor 1 


AA447115 

L36033, 

AA447115 


Hs.23767 


FLJ12666 


hypothetical protein FLJ12666 

Homo sapiens cDNA FLJ1266 fis, clone 

NT2RM4002256 


AW952494, 
H10192, 
AA1 15300, 
AA131466 


Hs.239 


FOXM1 


forkhead box M1 


U83113, 
AA1 29552 


Hs.239069 


FHL1 


four and a half LIM domains 1 


AA725097, 
AA455925 


Hs.239758 


FLJ12389 


nypotnetical protein FLJ12389 similar to 
acetoacetyl-CoA synthetase 
Homo sapiens cDNA FLJ12389 fis, clone 
mammai 002671 , weakly similar to Acetyl- 
coenzyme A synthase (EC 6.2.1 .1 ) 


AI697801, 
R48270 


Hs.241561 


PRSS2 


protease, serine, 2 (trypsin 2) 


U66061, 
AA284528 


Hs.2430 


TCFL1 


transcription factor-like 1 


AA705337, 
AA443950 


Hs 24950 




regulator of G-protein signalling 5 


AI674877, 

N34362, 

AA668470 


Hs.252587 


PTTG1 


pituitary tumor-transforming 1 


AA203476, 
AA430032 


Hs.25313 


MCRS1 


microspherule protein 1 


AF068007, 
AA488757 


Hs.25475 


AQP7 


aquaporin 7 


AW779701, 

H27752, 

AI075055 


Hs.256583 


ILF3 


interleukin enhancer binding factor 3, 90kD 


AF007140, 
AA449048 


Hs.256583 


ILF3 


interleukin enhancer binding factor 3, 90kD 


NM_012218, 
AA449048 


Hs.25797 


SF3B4 


splicing factor 3b, subunit 4, 49kD 


NM_005850, 
AA699361 


Hs.262958 


DKFZP434B04 
4 


hypothetical protein DKFZp434B044 

CO 1 O 


AA541776, 
AA460304 


Hs.26403 


GSTZ1 


glutathione transferase zeta 1 
(malevlacetoacetate isomerase) 


U86529, 
AA428334 


Hs.264330 


ASAHL 


N-acylsphingosine amidohydroiase (acid 
ceramidase)-like 


BE267007. 
W47576 


Hs.267289 


POLA 


polymerase (una directed), alpha 


NM 016937 
AA707650 


Hs.2699 


GPC1 


glypican 1 


NM_002081, 
AA455895 


Hs.270256 




Homo sapiens clone IMAGE:1963178, 

mRNA sequence 

ESTs 


AI355014, 
R10140 


Hs.270845 
Hs.279607 


KNSL5 

CAST i 


kinesin-like 5 (mitotic kinesin-like protein 1) 
calpastatin 


H63163, 
AA452513 


Hs.284142 


C21orf4 


chromosome 21 open reading frame 4 


U38525. H78523 

BE256559, 

W69668 


Hs.284142 


C21orf4 


chromosome 21 open reading frame 4 


BE1 42872, 
W69668 
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Hs.28465 




Homo sapiens cDNA: FU21869 fis, clone 
HEP02442 


AW582012, 
R63929 


Hs.288650 


AQP4 


aquaporin 4 


NM 001650, 
H09087 


Hs.291904 


DXS1357E 


accessory proteins BAP31/BAP29 


Z31696, 
AA625628 


Hs.2934 


RRM1 


ribonucleotide reductase M1 polypeptide 


X59543, 
AA633549 


Hs.293970 


ALDH6A1 


methylmalonate-semialdehyde 
dehydrogenase 


C00821, H63534, 

AA196160, 

H63534 


Hs.293970 


ALDH6A1 


methylmalonate-semialdehyde 
dehydrogenase 


M93405, N62179, 

AA196160, 

H63534 


Hs.294151 


KIAA1917 


KIAA1917 protein 


BE222511, 
AA452113 


Hs.295923 


SIAH1 


seven in absentia (Drosophila) homolog 1 


AA935716, 
T71889 


Hs.296049 


MFAP4 


microfibrillar-associated protein 4 


L38486, 
AA442695 


Hs.296259 


PON3 


paraoxonase 3 


L48516, R95740, 
T57069 


Hs.296341 


CAP2 


adenylyl cyclase-associated protein 2 


AW779995, 
AA040613 


Hs.296341 


CAP2 


adenylyl cyclase-associated protein 2 


U02390, 
AA040613 


Hs.30151 




ESTs, Weakly similar to JC5238 
galactosylceramide-like protein, GCP 
[H.sapiens] 


N73570 


Hs.30340 


KIAA1165 


hypothetical protein KIAA1 165 


AB032991, 
AA449330 


Hs.30340 


KIAA1165 


hypothetical protein KIAA1165 


AA770150, 
AA449330 


Hs.3416 


ADFP 


adipose differentiation-related protein | 


NM_001122, 

AA700054, 

AA142916 


Hs.35120 


RFC4 


replication factor C (activator 1) 4 (37kD) 


AA600213, 
N93924 


Hs.3530 


FUSIP2 


FUS-interacting protein (serine-arginine 
rich) 2 

I ^-associated serlne-araimne protein 2 


AK001656, 
H11049 

1 1 1 1 utt 


Hs.36102 




ESTs, Highly similar to SMHU1 B ~~ 
metanotnionein 1 B Iri.sapiensl 


R9Q907 VA70700 


Hs.37009 


ALPI 


alkaline phosphatase, intestinal 


NM 001631, 

AA1 90871 


Hs.38163 




Homo sapiens, Similar to hypothetical 
protein, MGC:7035, clone MGC:20737 
IMAGE:4563636, mRNA, complete cds 
ESTs 


AW074863, 
H63116 


Hs.3873 


PPT1 


salmitoyl-protein thioesterase 1 (ceroid- 
ipofuscinosis, neuronal 1, infantile) 


AL037943, 
AA034250 


Hs.388 


NUDT1 


nudix (nucleoside diphosphate linked 
moiety XHvpe motif 1 


AI656937, ; 
AA443998 


Hs.4 

Hs.41726 


ADH1B 
SERPINB8 


alcohol dehydrogenase 1B (class I), beta 
polypeptide 

serine (or cysteine) proteinase inhibitor, 


M24317, N93428 
NM 002640. 
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Lie yi«1 07 


LOC55977 


clade B (ovalbumin), members 

hypothetical protein 24636 


I W60100 
AI066576, 


nS.4^ooO 


ZWINT 


ZW10 interactor 


N62562 

AW409765, 
AA706968 


rlS.44od£ 

Hs.460 


1 inn 

UBO 
ATF3 


diubiquitin 


NM 006398, 
N33920 


Hs.4742 


GPAA1 


auuvami^ UdlloUipilOn Tdctor 3 

anchor attachment protein 1 (Gaalp, 
yeast) homolog 


N39944, H21041 
NM 003801, i 


Hs.4756 


FEN1 


flap structure-specific endonuciease 1 


AA455301 

BE278623. ] 
AA620553 I 


Hs.4788 


NCSTN 
KIM0253 


INIl^ClOM III 

nicastrin 


D87442, R96527 I 


Hs.4788 


NCSTN 
KIAA0253 


INIUClOU III 

nicastrin 


BE1 79772, 
R96527 J 


Hs.48348 


HH114 


iiypoineucai protein rln114 

Homo sapiens clone HH114 unknown 

mRNA 


AA428370, 
AA130117 | 


Hs.4854 


CDKN2C 


cyclin-dependent kinase inhibitor 2C (p18 


AF041248, | 
N72115 


Hs.49265 




ESTs 


AI141174, 

AI1 40241 J 


Hs.49912 


PXMP2 


peroxisomal membrane protein 2 (22kD) 


BE393339, 1 
N70714 | 


Hs.50758 


SMC4L1 


SMC4 (structural maintenance of 
cnromosomes 4, yeast)-! ike 1 
CAP-C chromosome-associated 

DolvnpntfHo P 


AB019987, 
AA452095 


Hs.50966 


CPS1 


carbamoyl-phosphate synthetase 1, 

m itoehnnri rial 


Y15793 N68399 I 


Hs.50966 


CPS1 


carbamoyl-phosphate synthetase 1 , j 

m itochnnrfrifll 


AA1 13231, | 
N68399 I 


Hs.5333 


KIAA0711 


KIAA071 1 gene product | 


NM_014867, | 
AA702544 I 


Hs.5719 


CNAP1 


chromosome condensation-related SMC- 
associated protein 1 ( 


D63880, 

AA668256 I 


1 l — a ex 

Hs.5719 


CNAP1 


uniumosome conaensation-related SMC- 1 
associated protein 1 


NM_014865. ) 
AA668256 


HS.572 


ORM1 


orosomucoid 1 


X02544, ~~ j 
AA700876 


MS. i>74 . 


FBP1 


fructose-1,6-bisphosphatase 1 


M 19922, 

AA699427 I 


Hs.5897 




i luifiu odpiens rnr\iN/\j CUNA 1 
DKFZp586P1622 (from clone ! 

DKF7nSfifiP1 KOO\ 


AI383214, 1 
T59658 1 


Hs.61638 


MYO10 


myosin X 1 


AH 98676. 
AA1 87977 


Hs.6551 


ATP6S1 


ATPase, H+ transporting, lysosomal 
(vacuolar proton pump), subunit 1 I 


NM 001183, 
AA487588 I 


Hs.6566 


TRIP13 


thyroid hormone receptor interactor 13 


BE090548. 
AA630784 J 


Hs.66 


L1RL1 


interleukin 1 receptor-like 1 I 


AB012701, I 


Hs.6838 


ARHE 


ras homolog gene family, member E T 
ESTs | 


AA125917 

W03441, 

AA443302 | 
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Hs.71465 


SQLE 


squalene epoxidase 


AF098865, 
R01118 


Hs.71622 


SMARCD3 


SWI/SNF related, matrix associated, actin 
aepenaent regulator of chromatin, 
subfamily d, member 3 
- wccamy similar to lw\A0319 |H, sapiens] 


U66619, 
AA136103 


Hs.737 


ETR101 


immediate early protein 


AA1 94084, 
AA496359 


Hs.738 


RPL14 


ribosomal protein L14 


BE410686, J 
AA486533 


Hs.73986 


CLK2 


CDC-like kinase 2 


NM 003993, 
AA282845 


Hs.740 


PTK2 


PTK2 protein tyrosine kinase 2 


NM 005607, 
AA291486 


Hs.74120 


APM2 


adipose specific 2 


AI093004, 
W94684 


Hs.74170 


MT1E 


metallothionein 1 E (functional) 


H72532, 
AA872383 


Hs.74561 


A2M 


alpha-2-macroglobulin 


NM 000014, 
AA775447 


Hs.74566 


DPYSL3 


dihydropyrimidinase-like 3 


D78014, 
AI831083 


Hs.74579 


KIAA0263 


KIAA0263 gene product 


D87452, 
AA634464 


Hs.74615 


PDGFRA 


platelet-derived growth factor receptor, 
alpha polypeptide 




Hs.74615 


PDGFRA 


platelet-derived growth factor receptor, 
alpha polypeptide 


AW887370, 
H23235 


Hs 7471 1 


UlNnJUo 


DnaJ (Hsp40) homolog, subfamily C f 
member 8 

Splicinq factor similar to dnaJ 


MAO J 0009, 

T60163 


Hs.748 


FGFR1 


fibroblast growth factor receptor 1 (fms- 
related tyrosine kinase 2, Pferffer 
syndrome) 


X66945, R54610 


Hs.75103 


YWHAZ 


tyrosine 3-monooxygenase/tryptophan 5- 
monooxygenase activation protein, zeta 
polypeptide 


AA911031, 
AA609598, 
H94670, 
AA485749 


Hs.75103 


YWHAZ 


tyrosine 3-monooxygenase/tryptophan 5- 
monooxygenase activation protein zeta 
polypeptide 


BE315169 
AA609598, 
H94670, 
AA485749 


Hs.75106 


CLU 


clusterin (complement lysis inhibitor, SP- 
40,40, sulfated glycoprotein 2, 
testosterone-repressed prostate message 
z, apoiipoprotein J) 


AA292226, 
AA4641M 


Hs.75117 


ILF2 


interieukin enhancer binding factor 2, 45kD 


AA307289, 
AA8Q4fift7 

H95638 


Hs.75117 


LF2 


interleukin enhancer binding factor 2, 45kD 


AA601029, 
AA894687, 
H95638 


Hs.75196 


BAT8 


HLA-B associated transcript 8 

G9A ankvrin repeat-containing protein 


NM 006709, 
AA434117 


Hs.75216 


PTPRF 


protein tyrosine phosphatase, receptor 
type, F 


F08552, 
AA598513 
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Hs.75318 


TUBA1 


tubulin, alpha 1 (testis specific) 


X06956, R36063, 
AA1 80742 


Hs.75361 


PK1.3 


gene from NF2/meningioma region of 
22q1 2 


AB023200, 
AA700048 


Hs.75438 


QDPR 


quinoid dihydropteridine reductase 


AA159812, 
R38198 


Hs.75545 


IL4R 


interleukin 4 receptor 


X52425, 

AA292025 f 


Hs.75572 


CPB2 


carboxypeptidase B2 (plasma, 
carboxvoeDtida<?f> 1 n 


NM_001872, 
H47837 


Hs.75618 


RAB11A 


RAB11A, member RAS oncogene family 


BE122870, I 
AA025058 I 


Hs.75658 


PYGB 


phosphorylase, glycogen; brain 


U47025, "1 
AA922705 I 


Hs.75678 


FOSB 


rDJ "» urine osteosarcoma viral oncogene 
homolog B 


L49169, T61948 


Hs.75812 


PCK2 


pi lu^pnuenui pyruvate carooxykinase 2 
(mitochondrial) 


X92720, I 
AA1 86901 i 


Hs .76252 


EDNRA 


endothelin receptor type A 


D90348, I 
AA450009 


Hs.76325 


SLU 


co i ^, nigniy similar to ICsJ HUMAN 
IMMUNOGLOBULIN J CHAIN [H.sapiensl 

Steo II snlrHnn fantnr CI I 17 


AW1 72754, I 
T70057 J 


Hs.7645 


FGB 


fibrinogen, B beta polypeptide 


AW589878, 
H91121.T73858 


Hs.76461 


RBP4 


retinol-binding protein 4, plasma 


AF074657, 
T72076 


Hs.7647 


MAZ 


M YC-associated zinc finger protein 
vm u| " itf-un iuii iij iranscnption tactor) 


BE264373, 
AA704613 I 


Hs.77256 


EZH2 


enhancer of zeste (Drosophila) homolog 2 


U52965, | 
AA428252 


Hs.77326 


IGFBP3 


insulin-like growth factor binding protein 3 


BE336944, 
AA598601 I 


ns.77393 


FDPS 


farnesyl diphosphate synthase (farnesyl 
Kjr 1 u k' iUGpi idio syninetase, 

dimethylallyitranstransferase, 
oeran vltranstra n^f £*r« q n\ 


D14697, T65790 


Hs.77597 


PLK 


polo (Drosophia)-like kinase 


X75932, 

AA629262 


Hs.77667 


LY6E 


lymphocyte antigen 6 complex, locus E 


NM 002346, I 
AA865464 


Hs.77854 


RGN 


regucalcin (senescence marker protein-30) 


AB032064, 
H05140 ! 


Hs.78045 


ACTG2 
TFPI2 


actin, gamma 2, smooth muscle, enteric 

tissue npthwsax/ inh!hif/-*r O 


NM_001615, I 
AA293402 I 


Hs.78465 


JUN 


v-jun avian sarcoma virus 17 oncogene 
homolog 


AI885769, j 
W96134 


nS.78524 


HTCD37 


TcD37 homolog 


AI263464, 
AA022472, 
AA456635 J 


Hs.78865 


TAF6 


TAF6 RNA polymerase II, TATA box 
binding protein (TBP)-associated factor 80 
kD 


NM 005641, | 
R19071 ! 


Hs.789 
Hs.78996 


GR01 
PCNA 


GROI oncogene (melanoma growth 

stimulating activity, alpha) 

proliferating cell nuclear antigen 


NM 001511, I 
W46900 

AI624204, ! 
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Hs.79078 


MAD2L1 


MAD2 (mitotic arrest deficient, yeast, 
homolnnVJikA 1 


AA450264 

NMJTO2358, 

AA481076 


Hs.79081 


PPP1CC 


protein phosphatase 1, catalytic subunit, 


NM_002710, 

AA1 29930 


Hs.79088 


RCN2 


reticulocalbin 2, EF-hand calcium binding 
domain 

i tail i 


AL1 20373, 
AA598676 


Hs.79334 


NFIL3 


nuclear factor, Interleukln 3 regulated 


X64318. 
AA633811 


Hs.79404 


D4S234E 


neuron-specific protein 


AA975473, 
AA875888 


Hs.80248 


RBPMS 


rwNM-Dinaing protein gene with multiple 
splicing 


D84107.T98807 


Hs.80658 


UCP2 


1 infant mlmsi nmtaln o />m:u«l a .j • t . 

uncoupling protein z (mitochondrial, proton 
carrier) 


AW192446, 
H61242 


Hs.81170 


PIM1 


piiii-i oncogene 


M54915 

AA447730 


Hs.8136 


EPAS1 


endothelial PAS domain protein 1 
numu sapiens cione ^3o9omRNA 
sequence 


U51626, R24882 


Hs.81687 


NME3 


non-metastatic cells 3, protein expressed 

in 
if i 


U29656, 

AA398218 


Hs.81848 
Hs.81892 


RAD21 
KIAA0101 


RAD21 (S. pombe) homotog 


NM_006265, 
AA683102 


Hs.82042 


SLC23A1 


KIAA01 01 gene product 

soiuie earner family 23 (nucleobase 

transporters), member 1 


D14657.W68219 
D87075, N23756 


Hs.821 


BGN 


Biglycan 

Zinc finger protein homologous to Zfp92 in 

mni icq 


NM_001711. 
R77226 W*1fl1ft 


Hs.82112 


IL1R1 


interleukin 1 receptor, type I 


M27492, R56687, 
AA464525 


Hs.82273 


FLJ20152 


hypothetical protein 


AI536745, 
AA446864 


Hs.82503 




Homo sapiens cDNA FLJ30550 fis, clone 
BRAWH2001502 

Homo sapiens mRNA for 3'UTR of 
unknown protein i 


Y09836, 
AA670382 


Hs.8265 


TGM2 


transglutaminase 2 (C polypeptide, protein- 
glutamineHgamma-QlutamwitranQfor^^e) 


M98479, j 
AA1 56324 


Hs.82794 


CETN2 


centrin. EF-hand nrotpin o 


NM 004344, 
N72193 


Hs.82906 

Hs.8294 
Hs.82962 


CDC20 

KIAA0196 ~~~ 
TYMS 


CDC20 (cell division cycle 20, S. 
cerevisiae, homoloq) 
KIAA0196 gene product 


BE293657 
NM 014846 


Hs.83164 
Hs.83753 
Hs.86368 


COL15A1 

SNRPB 

CLGN 


thyhnldylate synthetase 
collaaen. tvpe XV T alpha 1 
small nuclear ribonucleoprotein 
polypeptides B and B1 


NM_ 001071 
.01697 

BE252108 


Hs.86724 

Hs.87409 
Hs.8765 
Hs.8867 
Hs.8889 


GCH1 

THBS1 
RNAHP 
CYR61 
SHMT1 


calmeqin 

GTP cyclohydrolase 1 (dopa-responsive i 
dystonia) 

thrombosDondin 1 

RNA helicase-related protein 

cysteine-rich. angiogenic inducer T 61 

serine hvdroxvmethYltransferase 1 


NM 004362 

Z29433 

NM 003246 
AI127821 

Y12084 I 
AI761724 
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Hs.89538 
Hs.89691 



Hs.93210 



Hs.93597 



Hs.93597 
Hs.93832 



Hs.94360 
Hs.94382 



CETP 
UGT2B4 



C8A 



CDK5R1 



CDK5R1 
LOC54499 



MT1L 

ADK 

ZNF261 



[soluble) 

cholestervl ester transfer protein, plas ma 
UDP glycosyltransferase 2 family, 
polypeptide B4 



complement component 8, alpha 
polypeptide 



cyclin-dependent kinase 5, regulatory 
subunit 1 fo35) 



cyclin-dependent kinase 5, regulatory 
subunit 1 (p35) 



putative membrane protein 



metallothionein 1L 
adenosine kinase 



M30185 
AF084200 



M16974 



T04872 



AW088206 



AW081809 



F26137 
NM 001123 



Hs.9568 



Hs.95998 



Hs.9629 



Hs.9670 



ZNF261 



FRDA 



PRCC 



FLJ10948 



zinc finger protein 261 
zinc finger protein 261 



Friedreich ataxia 



papillary renal cell carcinoma 
(translocation-assoclated) 



hypothetical protein FLJ10948 



X95808 



NM 005096 



AW409831 



BE258195 



AA805411 
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In the second gene list, a total of 230 features, containing 166 unique 
UniGenes from the 218 significant gene list (containing 213 unique UniGenes) 
identified herein (Table 1) were observed to overlap (Table 3). ffierarchical 
clustering analysis based on the expression levels of these 230 'overlap' features 
separated the tissue set into distinct tumor and non-tumor groups, with four tissue 
samples misclassiried. Random permutation of sample labels indicated that the 
clustering was significant (P a <lxl0- 8 ) and it was unlikely that a randomly chosen set 
of 230 features could produce four or fewer samples misclassified (P^lxio" 4 ). These 
230 'overlap' features are therefore able to discern HCC tumor from non-tumor liver. 

Table 3. Intersection of Significant Genes Identified Herein with HCC Genes 



UniGene 
Identifier 


Gene 


Description . 


GenBank No. 


Hs.1 03804 


HNRPU 


heterogeneous nuclear ribonucleoproteln U 
(scaffold attachment factor A) ' 


X65488, T97547, 
AA496741 


Hs.104143 


CLTA 


clathrin, light polypeptide (Lea) 


AW974204, 1 
AA1 13872 


Hs.105465 


SNRPF 


small nuclear ribonucleoproteln polypeptide 


AA649986, 
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Hs.108332 


UBE2D2 


ubiquitin-conjugating enzyme E2D 2 
(homologous to yeast UBC4/5) 


mm oo^*v*o 

AA159600, 

AA431868 


Hs.10842 


RAN 


RAN, member RAS oncogene family 


NM 006325, 
AA456636 


Hs. 10848 
Hs.108636 


KIAA0187 
C1orf9 


KIAA0187 npnp nrnrliirt 


D80009, 
AA121504, 
AA129555, 
AA402812 


Hs.108689 


SREBF2 


uiiumubume i open reaping frame 9 

sterol regulatory element binding 
transcription factor 2 


BE466870, N3617R 
AA608556, 
AA701914, 
AA608556 


Hs.108809 
Hs.110713 


CCT7 
DEK 


chaperonin containing TCP1 , subunit 7 (eta) 


AA314436, | 
AA676588 


Hs.1119 


NR4A1 


t ^ t - rx viiiiAjqene iuinm Dinding) 
nuclear receptor subfamily 4, group A, 


AI888504. R25377 

NM_002135, 

N94487 


Hs.11355 


TMPO 


thymopoietin 


U09087, H21746. 
AA676998. T63980 


Hs.115617 


CRHBP 


corticotropin releasing hormone-binding 
protein 


NM 001882, 
N26546. AA286752 


Hs.1 17367 


SLC22A1 


i>uiuie earner lamiiy ^ (organic cation 
transporter), member 1 


X98332, AA702013 


Hs.1174 


CDKN2A 


cyclin-dependent kinase inhibitor 2A 
{melanoma nlfi inhihstc oni/>i \ 


AI859822, 
AA877595 


Hs.1 18249 


ARFGEF2 


ADP-ribosy!ation factor guanine nucleotide- 
^v*naiiHo iduiur £. \ureieiain A-inhibited) 


AA099582 N34053 


Hs.1 18638 
Hs.1 1902 


NME1 
MYLE 


non-metastatic cells 1, protein (NM23A) 
ex Dressed in 

MYLE protein 


AA147871, 

AA644092 

AA628977 r T68845 


Hs.1 19651 
Hs.12107 


GPC3 
BC-2 


olvninan ^ 

putative breast adenocarcinoma marker 
(32kD) 


U50410. AA775872 
AF042384, N25578 


Hs.12482 


GNPAT 


y lyuci v#i ic^ii lu^pnaie w-acyiiransTerase 


AF043937, 
W72079 


Hs.125180 


GHR 


growth hormone receptor 


X06562, N70358, 
AA775738 


Hs.1 3340 


HAT1 


histone acetyltransferase 1 


AF030424, 
AA625662 


HS.1 48495 

Hs.151787 
Hs.1 52931 


PSMD4 

U5-116KD 
LBR 


Droteasomp ( nrn^nmo mQ^mnoin\ ooc 
^u^aouiuc yiJiu&uuitj, macropainj ^oo \ 

subunit. non-ATPase P 4 

U5 snRNP-snecifir nrnt^in ha i 


AA604027, 
AA450227 
D21163. AA779221 


Hs.15318 
Hs.1 54073 


HAX1 
UGTREL1 


lamin B receptor 
HS1 binding protein 

UDP-galactose transporter related 


L25931. AA099136 
BE260953. R76263 
AW1 92554, 
R41839 


Hs.1 55079 


PPP2R5A 

111 1 \ w/ \ 


protein phosphatase 2, regulatory subunit B 
(B56). aloha isoform 


AA234460, R59164 


Hs.1 55637 


PRKDC 


protein kinase, DNA-activated, catalytic 
polypeptide 


U34994, R27615 


Hs.1 561 10 
Hs.1 600 


IGKC 
CCT5 


Vnmunoglobulin kappa constant 
chaperonin containing TCP1. subunit 5 


AW404507, 
AI732289, 
AA476918, 
AA486362 
D43950. 
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Tensilon) 


AA1 26599, 
AA629692 


Hs.1624 
Hs.16341 


EFNA1 
MAWBP 


ephrin-A1 


NM 004428, 
AA857015 
AI866254. R54416 


Hs.16426 
Hs.1657 


PODXL 
ESR1 


podocalvxin-like 
estrogen receptor 1 


BE395330, N64508 
AL078582, ~* 
AA1 64585, 
AA291702 


Hs.1 66468 


PDCD5 


programmed cell death 5 


AA452724, 
AA1 56940 


Hs.1 66891 


RFX5 


regulatory factor X, 5 (influences HLA dass II 


AL050135, 
AA418045 


Hs.1674 


GFPT1 


glutamine-fructose-6-phosphate 

frflnQsminQGo A 


NMJ)02056, 
AA478571 


Hs.1 69407 


SACM2L 


SAC2 (suppressor of actin mutations 2, 
ycdt>i, i iuuiuiuy;-UKe 


AK001725, 
AA454836 


Hs.1 73274 


ICAP-1A 


integrin cytoplasmic domain-associated 


AF012023, 
AA456882 


Hs.1 74140 


ACLY 


ATP citrate lyase 


AW967351, 
HQ8547. AA1 26708 


Hs.1 74220 


CYP2C8 


cytochrome P450, subfamily IIC 
(mepnenytom 4-hydroxvlase), polypeptide 8 


M 17398 N^^I^R 


Hs.1 77592 


RPLP1 


ribosomal protein, large, P1 


AW963733, j 
AI732304 


Hs.1 8041 4 


HSPA8 


heat shock 70kD protein 8 


AW249010, 
H64096. AA629567 


Hs.1 80446 


KPNB1 


karyopherin (importin) beta 1 


L38951, 
AA121732, 
AA251527, 
AA425006 


Hs.1 80577 


GRN 


granulin 


AI375908, ' 

AI054019, 

AA496452 


Hs.1 80610 


SFPQ 


splicing factor proline/glutamine rich 
(polypyrimidine* tract-binding protein- 
associated) 


X70944, R96240, 
N24024, AA425258 


Hs.1 81 357 


LAMR1 


lamlnin receptor 1 (67kD, ribosomal protein 
SA) 


AW328280, 
AA629897 


Hs.1 81444 
Hs.1 84222 


LOC51235 . 
DSCR1 


hypothetical protein 


Al 190653, 
AA455565 


Hs.1 8443 
Hs.1 94673 


ALDH8A1 
PEA15 


Down syndrome critical reqion qene 1 

aldehyde dehydrogenase 8 family, member 
A1 


U85267, AA629707 
AI051566, N70701 


Hs.1 989 


SRD5A2 


phosphoprotein enriched in astrocytes 15 
bieroia-o-aipna-reauctase, alpha polypeptide 


Y13736. AA293211 
M74047, AI420552 


Hs.1 99067 

• *W« I v vv w f 


i— 1 \ LJ LJ O 


v-erb-b2 avian erythroblastic leukemia viral 
oncogene homoloq 3 


AI565773 N24QRR 
AA042878 


Hs.1 99263 


MT1L 
STK39 


metallothionein 1L " 
serine threonine kinase 39 (STE20/SPS1 
homoloq, yeast) 


F26137, H84871 


Hs.2 


NAT2 


N-acetyltransferase 2 (arylamine N- 
acetyltransferase) 


D90040, AI262683 


Hs.20144 


SCYA14 


small inducible cytokine subfamily A (Cys- 
Cys), member 14 


NM 004166, 
R96626 
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Hs.20716 


TIM17 


translocase of inner mitochondrial membrane 
17 homoloa A (yeast) 


AW247564, I 
AA708446 j 


MS.22785 


GABRE 


^Miiiiiia aiim iwuuiyi K, add IwnDA) A 

receptor, epsilon 


NM 004961, 1 
H63532 


Hs.236774 


HMG17L3 


high-mobility group (nonhistone 

Chromosomal \ nrntoin i 7 Jib- a o 


U90549 R17124 


Hs.236828 
Hs.237356 


WHIP 
SDF1 


Werner helicase interacting protein 
stromal cell-derived factor 1 


AA481600, 1 
AA188168 


Hs.23767 


FU12666 


hypothetical protein FLJ12666 


L36033. AA447115 

AW952494, 

AA432056 


Hs.24485 


CSPG6 


chondroitin sulfate proteoglycan 6 (bamacan) 


NM 005445, 
W40150, 

AA463410 I 


Hs.245710 


HNRPH1 


heterogeneous nuclear ribonucleoprotein H1 
(H) 


BE296051, ~1 
R11018. W96058 


Hs.247324 


MRPS14 


mitochondrial ribosomal protein S14 


AW973521, j 
_T51290, AA460831 


Hs.24950 


RGS5 


regulator of G-protein signalling 5 


AI674877, N34362, 
AA668470 


Hs.25132 


KIAA0470 


KIAA0470 gene product 


NM 014812, I 

AI049669, | 

AA167129, 

AA1 87982 I 


Hs.252229 


MAFG 


v-maf musculoaponeurotic fibrosarcoma 
(avian) oncoaene family, protein G 


AF059195, 1 
N21609. AA045436 


Hs.25647 


FOS 


v-fos FBJ murine osteosarcoma viral 
unouyene nomoiog 


V01512, R12840, 
N36944. AA485377 


Hs.25797 


SF3B4 


splicing factor 3b, subunit 4, 49kD 


NM_005850, j 
AA699361 


Hs.26403 


GSTZ1 


glutathione transferase zeta 1 
(maleylacetoacetate isomerase) . 


U86529, AA428334 


Hs.26433 


DPAGT1 


Quiicnyi-pnospnate (UDP-N- 
acetylglucosamine) N- 

acetylglucosaminephosphotransferase 1 
(GlcNAc-1-P transferase) 


Z82022, R55619, 
AA452517 


Hs.271980 


MAPK6 


mitogen-activated protein kinase 6 


NM 002748, j 
AA603152. H17504 


Hs.275163 


NME2 


non-metastatic cells 2, protein (NM23B) 
expressed in 


L16785, I 
AA422058, 
AA496512 ) 


Ue* 007707 

HS.2o7797 
Hs.291904 


•Ton 

ITGB1 
DXS1357E 


integrin, beta 1 (fibronectin receptor, beta 
ywijrpcpiiuc, ctiiiiycii uuzy mciuoes MUi~2 
MSK12) 

Homo sapiens, clone MGC: 17220 
accessory proteins BAP31/BAP29 


W38716, j 

AA037283, 

W67173 

Z31696. AA62SR98 


Hs.2934 
Hs.293441 


RRM1 


Homo sapiens SNC73 protein (SNC73) 
mRNA, complete cds 


X59543. AA633549 
AA290845, 
H28469. H73590 


Hs.296341 


CAP2 


adenylyl cyclase-associated protein 2 


AW779995, I 
AA040613 I 


Hs.300697 
Hs.301005 


IGHG3 
H2AV 


mmunoglobulin heavy constant gamma 3 
(G3m marker) 

histone H2A.F/Z variant 


D78345, | 
AA740786, j 
N92646. AA465378 
BE409809. H97000 


Hs.301404 


RBM3 


RNA binding motif Drotein 3 


NM 006743. | 
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Hs.301819 



Hs.3041 



Hs.3164 



Hs.321231 



Hs.323817 
Hs.332633 



Hs.333495 



Hs.334612 



Hs.334787 



Hs.342389 
Hs.349961 



Hs.356525 
Hs.3610 



Hs.36102 
Hs.4 



ZNF1 46 zinc finger protein 1 46~ 



UNG2 



uracil-DNA glycosylase 2 



AA054287 
I X70394. AA504351 



NUCB2 



AA291356, 
AA4259Q0 



nucleobindin 2 



B4GALT3 | UDp -<3al'betaGlcNAc beta 1,4- 

galactosvltransfera. se, polype ptide a 



AW951523, 
W93954. 
! AA484939 



DKFZP547E 
101 

BBS2 



Y12509, AA424578 



DKFZP547E1010 protein 
Bardet-Biedl syndrome 2 



DSS1 



NM_015607, 
AA406292. 
AA418004 
AA425759. 
AA486738 



Deleted in split-hand/split-foot i region wrong? 

Small ni irloar rihnnn>%la«».i.! ■ .. . 1 1 — — 



SNRPE I | ma " nuc| ear ribonucleoprotein polypeptide 



X12466, AA678021 



MGC19556 hypothetical protein MGC19556 



PPIA 



peptidylprolyl isomerase A (cyclophilin A) 



BE379431, 
AA609463 



RPL6 



ribosomal protein L6 



AW732921, 
H72674 



FLJ12806 
KIAA0205 



ESTs, Weakly similar to CNG1_HUMAN 
cGMP-gated cation channel alpha 1 (CNG 

channel alpha 1) 

KIAA0205 gene product 



AW675430, 
AA629808 



Hs.41587 



Hs.431 



Hs.44532 
Hs.44585 



Hs.46440 



Hs.4756 



ADH1B 



ESTs, Highly similar to SMHU1B 
metallothionein 1B [Rsapiens] 
alcohol dehydrogenase 1B (class I), beta 
polypeptide 



BE044582, T73794 
D86960. R91263 



R99207, H72722 
M24317, N93428 



RAD50 RAD50 (S. cerevisiae) homolog 



, Z75311, H99196, 
' AA1 26482 



BMI1 



murine leukemia viral (bmi-1) oncogene 
homolog 



UBD 
TP53BP2 



AA884913, 
AA608856, 
T87514, W90704. 
AA478036 



diubiquitin 
tumor protein p53-binding protein, 2 



SLC21A3 



FEN1 



Hs.50758 



Hs.5085 



Hs.52002 



Hs.554 



Hs.5662 



Hs.57101 



solute carrier family 21 (organic anion 
transporter), member 3 

flap structure-specific endonuclease 1 



NM_006398, 
N33920 
AI123916, H69077, 
N34418 



U21943. N62948 



SMC4L1 | SMC4 (structural maintenance of 
chromosomes 4, yeastV-like 1 



BE278623, 
AA620553 



DPMI 



AB019987, 
AA452095 



CD5L 



dolichyl-phosphate mannosyltransferase I AW1 73486 
Bgfagg^S 1 r catalytic subunit I AA004759 ' 



CD5 antigen-like (scavenger receptor 
cysteine rich family) 



SSA2 



NM_005894. 
AA677254 



Sjogren syndrome antigen A2(60kD, I NM 004600, 

ribonucleop rotein autoantigen SS-A/Ro) \ AA010351 



GNB2L1 | Suanine nucleotide binding protein (G 
protein), b eta polypeptide 2-like 1 



MCM2 



minichrom osome maintenance deficient (S. 
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Hs.5737 
Hs.57783 


KIAA0475 
EIF3S9 


cerevislae) 2 (mitotin) 

KIAA0475 nonn nmrii ir4 ■ 

eukaryotic translation initiation factor 3 
subunit 9 (eta. 116kD) 


AA454572 
AA524523, N73927 

U62583, AA676471 


Hs.6127 




Homo sapiens cDNA: FU23020 fis, clone 
LNG00943 


AA054768, T67278 


Hs.6551 


ATP6S1 


ATPase, H+ transoortinn iv^ncnmQi 
interacting protein 1 


NM 001183. " 
AA487588 


Hs.6650 


VPS45B 


vacuolar protein sorting 45B (yeast homolog) 


AA702845, 
AA885433 


Hs.6838 
Hs.695 ~ 


ARHE 

CSTB ~^ 
PPIB 
ZNF238 
CHD4 


ras homolog gene family, member E 
cystatin B (stefin B) — 
peptidylprolyl isomerase B (cyclophilin B) 


W03441, W86282, 
AA443302 


Hs.699 
Hs.69997 ~ 
Hs.74441 


AI831499. 110374 
BE386706, 
N45313. AA481464 
AJ223321. R79722 

BE408958 NIAVro 


Zinc finnpr nrntoln 555 

chromodomain helicase DNA binding protein 


Hs.75117 


ILF2 


interleukin enhancer binding factor 2, 45kD 


AA307289, 
AA894687 H95638 


. Hs.75183 


CYP2E 


cytochrome P450, subfamily HE (ethanol- 
inducible) 


J02843, H50500 


Hs.75187 


KIAA0016 


udi ii>iuuase or outer mitochondrial 
membrane 20 (veast) homoloq 


D13641, AA644550 I 


Hs.75258 


H2AFY 


H2A histone family, member Y 


AA307460, H 
AA486003 I 


Hs.75354 
Hs.75412 


GCN1L1 
ARMET 


vav-riN i ^general control of ammo-acid 
synthesis 1, veast)-like 1 


D86973, R55250 


Hs.75424 


ID1 


ai yn in iu-ncn r muiatea in early staqe tumors 
Inhibitor of DNA binding 1, dominant 
LissaSHjiS loop-neux protein 


AA582041. R91550 
S78825 AA4571«?ft I 


Hs.75546 


CAPZA2 


capping protein (actin filament) muscle Z- 
line, alpha 2 


U03851, AA083228] 


Hs.75659 


MPV17 


MpV17 transgene, murine homolog, 
y iui i mr uiuscierosis 


NM 002437, i 
R55046 j 


Hs.75678 


FOSB 


FBJ murine osteosarcoma viral oncogene 

homnlnn ft 


L49169 T61Q4R I 


Hs.75981 


USP14 


ubiquitin specific protease 14 (tRNA-guanine 
transglycosylase) 


NM_005151. 
AA039511. T65861 


Hs.76230 | 


RPS10 


ribosomal protein S10 


AW245775, 
AA828564, 
AA828819, 
AI054003 j 


Hs.76285 


DKF2P564B 
167 


DKFZP564B167 protein 


AI032331, 
AA621342 


Hs.76325 


GJ 


immunogioouiin J polypeptide, linker protein 
for immunoglobulin alpha and mu 
polypeptides 

Homo sapiens, clone MGC: 24130 


AW1 72754, I 
T90492 T70057 I 


Hs.7655 


U2AF65 


U2 small nuclear ribonucleoprotein auxiliary 
factor (65kD) * 


AA936430, 
AA405748 


Hs.7720 


DNCH1 


dyneln, cytoplasmic, heavy polypeptide 1 j 


AB002323, 
AA010589, 
W78967 | 


Hs.77254 


CBX1 


chromobox homolog 1 (HP1 beta homolog 
Drosophila ) 


AL046741, | 
AA448667 j 
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Hs.77326 


IGFBP3 


insulin-like growth factor binding protein 3 


BE336944, I 
AA598601 I 


Hs.77608 
Hs.78065 


SFRS9 
C7 


splicing factor, arginine/serine-rich 9 


AL021546, | 
N47892. AA4907?1 I 


Hs.78902 
Hs.79090 


VDAC2 
XP01 


complement component 7 
voltage-dependent anion channel 2 
exDortin 1 fCRMi uooct hAmAin«\ 


X86328. AA598478 
AI015604, 1 
AA857093.T66813 


Hs.79110 


NCL 


- wa hwui» i ^^rxivMj yeast, nomoiog) 

nucleolin 


D89729. T59055 
AK000250, 
AA433818 J 


Hs.79150 
Hs.79162 


CCT4 
SSRP1 


chaperonin containing TCP1 , subunit 4 
(delta) 


U38846, T98634, 
AA088226, 
AA598637 I 


Hs.80343 


MMP15 


structure specific recognition protein 1 
matrix metalloproteinase 15 (membrane- 
inserted) 


AI635077 r R11356 
D85510, AA443300 


Hs.80552 


DPT 


derm atooon tin 


AW016451, ~| 
R48303 ! 


Hs.809 


HGF 


hepatocyte growth factor (hepapoietin A; 
scatter factor) 


X16323, R52797 J 


Hs.80917 
Hs.80919 


AP3S1 
SYPL 


adaptor-related protein complex 3, sigma 1 
subunit 

synaptoohysin-like protein 


D63643 AAQQRfl44 I 


HS.o1972 
Hs.82043 


SHC1 
D123 


wi iv iiurnoiogy ^ oomam-containing) 
transforminq protein 1 


_S72481. AA427447 
X68148, R52960, ! 
T50498 


Hs.82159 


PSMA1 


D123aene product 

proteasome (prosome, macropain) subunit 
alpha type r 1 


U271 12. AA448289~I 
AI889267, R27585 | 


Hs.82793 


PSMB3 


proteasome (prosome, macropain) subunit, 
beta typej 3 


AI028114, 




CCT6A 


ChdDeronin onntaininn TPD1 c*i iKi 

vi lapciuiuii vui nan hi ig l i/ri, SUDUnit OA 

(zetal) 


AA620580 

L27706, 

AA872690, H842AR 


Hs.83753 
Hs.84790 


SNRPB 
C7orf14 


small nuclear ribonucleoprotein polypeptides 
B and B1 


BE252108, | 
AA599116 I 


Hs.85119 
Hs.8765 


SMT3H1 
RNAHP 


chromosome 7 open reading frame 14 

SMT3 (suppressor of mif two 3, yeast) 
homolog 1 

RNA helicase-related protein 


D86978, AA60019O 
AA1 60893, | 
AA862629, 
AA872379 ! 
AI814448.T56221 


Hs.8867 
Hs.89525 


CYR61 
HDGF 


cysteine-rich, angiogenic inducer, 61 
icpdioma-aerivea growth factor (high- 
mobilltv group protein 1-like) 


Y12084. AA777187 
BE259164, 
AA453749 J 


Hs.90093 


HSPA4 


heat shock 70kD protein 4 


AB023420, 
AA131267, 
AA433916 I 


Hs.90370 


ARPC1A 


actin related protein 2/3 complex, subunit 1 A 
(41 kD) 


Y08999, | 
AA490209, 
AA016251, 
AA151930 I 


Hs.90744 


PSMD11 


proteasome (prosome, macropain) 26S 
subunit. non-ATPase. 11 


AB003102 


Hs.99969 


FUS 


usion, derived from t(12;16) malignant 
posarcoma 


BE396632, 101207 
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In the third gene list, a total of 68 unique UniGenes from the 21 8 significant 
gene list (containing 213 unique UniGenes) identified herein (Table 1) were observe 
to overlap (T able 4), and the likelihood that the overlap would arise by chance if the 
two gene lists were totally independent was minuscule (P 0 <lxl0"*). 



Table 4. Intersection of Significant Genes Identified Herein with HCC Genes 



UnlGene 
identifier 

Hs.1 19651 


Gene 
GPC3 


Description 

fllypican 3 


GenBank No. 
U50410. AA775872 


Hs.125180 
Hs.1 80577 
Hs.44585 


GHR 
GRN 
TP53BP2 


growth hormone receptor 

yi di luiiii 


X06562. N70358 
AI375908 ~* 
. AA496452 


Hs.77326 
Hs.8867 


IGFBP3 
CYR61 


tumor protein p53-bindinq protein r 2 
insulin-like growth factor binding protein 3 


AI123916, H69077 

BE336944, 

AA598601 


Hs.1600 


CCT5 
HSEC61 


cysteine-rich, anaioaenic inducer. 61 

chaperonin containing TCP1, subunit 5 

(epsilon) 

sec 61 homolog 


Y12084. AA777187 
D43950, AA629692 


Hs.75410 
Hs.1 52931 


HSPA5 
LBR 


heat shock 70kD protein 5 (glucose- 
regulated protein. 78kD) 


AL043206, 
AA962446 


Hs.7720 


DNCH1 


lamin B receptor 

uyiidii, oyiufjidbrnic, neavy polypeptide 1 


L25931. AA099136 

AB002323, 

W78967 


Hs.4756 


FEN1 


nap ou uoiui c-5>peuiric enaonuciease 1 


BE278623, 
AA620553 


Hs.50758 


SMC4L1 


SMC4 (structural maintenance of 
chromosomes 4, yeast)-like 1 
CAP-C chromosome associated 
polypeptide C 


AB019987, 
AA452095 


Hs.77254 
Hs.2934 


CBX1 

RRM1 ~ 


chromobox homolog 1 (HP1 beta homolog 
Drosophila ) 


AL046741, 
AA448667 


Hs.1 561 10 


IGKC 


immunoglobulin kappa constant 


X59543. AA633549 
AW404507, 
AA402920, 
AA486362 


Hs.20144 
Hs.237356 


SCYA14 
SDF1 


small inducible cytokine subfamily A (Cys- 

Cys), member 14 

stromal cell-derived factor 1 


NM 004166, 
R96626 

L36033. AA447115 


Hs.78065 
Hs.1 18638 


C7 

NME1 


complement component 7 
non-metastatic cells 1, protein (NM23A) 
expressed in 


X86328. AA598478 
AA1 47871, 
AA644092 


Hs.1 2482 


GNPAT 


glyceronephosphate O-acyltransferase 


AF043937, 
AA486845 


Hs.1 741 40 


ACLY 


ATP citrate lyase j 


AW967351, j 
H08547 


Hs.1 74220 


CYP2C8 


cytochrome P450, subfamily IIC 
(mephenytoin 4-hydroxylase), polypeptide 
8 


M17398, N53136 


Hs.26403 


GSTZ1 


glutathione transferase zeta 1 
(malevlacetoacetate isomerase) 


U86529, AA428334 
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Hs.4 


ADH1B 


alcohol dehydrogenase 1B (class I), beta 
polypeptide 


M24317, N93428 


Hs.1 77592 


RPLP1 


ribosomal protein, large, P1 


AW963733, 
AI732304 


Hs.25797 


SF3B4 


splicing factor 3b, subunit 4, 49kD 


NMJ)05850, 
AA699361 


Hs.76230 


RPS10 


ribosomal protein S10 


AW245775, 1 
AI054003 


Hs.76325 


IGJ 
SLU7 


immunoglobulin J polypeptide, linker 
protein for immunoglobulin alpha and mu 
polypeptides 

step II SDlicina factor SLU7 


AW1 72754, 
T70057 


Hs.83753 


SNRPB 


on ic»i uucriecir noonucieoprotein 
polypeptides B and B1 


BE252108, | 


Hs.115617 


CRHBP 


wrucuiropin releasing hormone-binding 
protein 


AA599116 

NM 001882, | 


Hs.1 18249 


ARFGEF2 
BIG2 


/AL/i—riDosyianon factor guanine 

nucleotide-exchange factor 2 (brefeldin A- 
inhihitprf} 

II II IIWIICVJ J 

Brefeldin A-inhibited guanine nucleotide- 
exchanae nrntein 


AA286752 

AA099582, N34053 


Hs.1 55079 


PPP2R5A 


protein phosphatase 2, regulatory subunit 
B (B56). abha isoform 


AA234460, R59164 


Hs.1 55637 


PRKDC 


Hiuieiu kinase, UNA-activated, catalytic 
polypeptide 


U34994, R27615 I 


HS.1 624 


EFNA1 


ephrin-A1 


NMJ)04428, J 
AA857015 | 


ns.1 82278 


CALM2 


v^aiiiiuuuiin vpnospnoryiase kinase 
delta) 


D45887, AA043551 


Hs.1 99263 


STK39 
SPAK 


ocn,,c uiicunine Kinase oy (oTc20/SPS1 
homolog, yeast) 
ste-20 related kinase 


F26137, H84871 


Hs.22785 


GABRE 


gamma-aminobutyric acid (GABA) A 
receptor. eDsilon 


NM_004961, | 
H63532 1 


Hs.24950 


RGS5 


regulator of G-protein signalling 5 


AI674877, N34362, 
AA668470 _J 


Hs.296341 


CAP2 


adenylyl cyclase-associated protein 2 


AW779995, 1 
AA040613 


Hs.81972 


SHC1 


SHC (Src homology 2 domain-containing) 
transforming protein 1 
nuclear receptor subfamily 4, group A, 
member 1 


X68148, R52960, 

T50498 

NM_002135, 
N94487 


Hs.1119 


NR4A1 


Hs.1657 


ESR1 


estrogen receptor 1 


AL078582, 
AA291702 


ns.lDuoyi 


nt"\/r 

RFX5 


reoulatorv faefnr y <t/inflnon^oe ui a r 

oyuiaiuj y icauiur y\, O UnTIUenCeS HLA 

class II expression) 


AL050135, | 
AA418045 I 


Hs.252229 


MAFG 


v-maf musculoaponeurotic fibrosarcoma 
(avian) oncogene family, protein G 


AF059195, N21609 I 


Hs.25647 


FOS 


v-fos FBJ murine osteosarcoma viral 
oncogene homolog 


V01512, N36944, 
AA485377 j 


Hs.431 


BMI1 


murine leukemia viral (bmi-1) oncogene 
homolog 


AA884913, 
W90704, I 
AA478036 


Hs.75117 
Hs.75678 


LF2 
FOSB 


nterleukin enhancer binding factor 2 
45kD 

FBJ murine osteosarcoma viral oncogene 


AA307289, 
H95638. AA894687 
L49169. T61948 | 
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Hs.6551 
Hs.79090 


ATP6IP1 

XP01 ~~ 


homolog B 

ATPase, H+ transporting, lysosomal 
interacHnq protein 1 


NM 001183, 
AA487588 


Hs.80917 
Hs.80919 


AP3S1 


exportin 1 (CRM1 . veast. homoloq) 
adaptor-related protein complex 3, sigma 
1 subunit 

SVnaDtODhvsin-like nrntpin 


D89729, T59055 
D63643, AA996044 


Hs.44532 
Hs.106061 


UBO 
RDBP 


diubiquitin 


S72481. AA427447 
NM 006398N3392 ! 
0 


Hs.108636 
Hs.110713 


Clorf9 

CH1 

DEK 


RD RNA-bindinp protein 
chromosome 1 open reading frame 9 
membrane protein CH1 


X16105. AA056390 
BE466870, N36176 


Hs.11355 


TMPO 


DEK oncoqene (DNA bindinq) 

Thymopoietin 

ESTs 


AI888504. R25377 
U09087, T63980 


Hs.16341 
Hs.16426 


MAWBP 
PODXL 


MAWD binding protein 

ESTs weakly similar to predicted using 

genefinder [C. eleqans] 


AI866254. R54416 


Hs.18443 
Hs.1 94673 


ALDH8A1 
PEA15 


podocalyxin-like 

diuej iyue uenyarogenase o family, 

member A1 

ESTs 


BE395330. N64508 
AI051566, N70701 


Hs.23767 
Hs.291904 


FU12666 
DXS1357E 


phosphoprotein enriched in astrocytes 15 

hypothetical protein FLJ12666 

Homo sapiens cDNA FLJ12666 fis, clone 

NT2RM4002256 

accessory proteins BAP31/BAP29 


Y13736. AA293211 
AW952494, 
H10192, 
AA1 15300, 
AA131466 j 
Z31696, AA625628 


Hs.3610 
Hs.36102 


KIAA0205 


KIAA0205 gene product 

ESTs, Highly similar to SMHU1 B 

metallothionein 1B [H.sapiensJ 

co 1 » nigniy similar to MT1 3 Human 

Metallothionein-IB fH.sapiens] 


D86960. R91263 
R99207 H72722 

■ V If 1 II (.1 fife 


Hs.6838 


ARHE 


ras homolog gene family, member E 


W03441, 
AA443302 


Hs.75187 


TOMM20- 
PENDI 


translocase of outer mitochondrial 
membrane 20 (yeast) homolog 
KIAA0016 translocase of outer 
mitochondrial membrane 20 (yeast) 
homolog 


D13641, AA644550 


Hs.8765 


RNAHP 


RNA helicase-related protein j 


AI814448. T56221, " 
N55459 



5 The discriminator cassettes were assessed on an independent tissue set of 58 

liver clinical biopsies from 29 patients-Using a kNN prediction algorithm, it was 
found that all classifier probe cassettes could readily distinguish HCC tumor from 
non-tumor liver (Table 5), and that the gene (nscriminators of tumor vs. non-tumor i 
HCC derived by the 

10 intersect analysis of limited tissue sets can be validated in an independent manner. 
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Table 5. Prediction accuracy of gene classifiers using k NN algorithm on 58 liver 
biopsies from 29 patients. 



Gene 
classifiers 



Table 1 



No. of 

gene classifiers 



218 



Misclassification 
rate 



4 of 58 



No.of 

false negative 

cases* 

4 



No. of false 
positive cases* 



Predictive 
accuracy 



Table 3 



3 of 58 



166 



3 of 58 



95% 
95% 



2 of 58 



•False negative cases refer to HCC tumors which were misclassified'a s non-tumor Ifrere" 
False posrtive cases refer to non-tumor livers which were misclassified as HCC tumor!' 
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OTHER EMBODIMENTS 

Although various embodiments of the invention are disclosed herein, many 
adaptations and modifications may be made within the scope of the invention in 
accordance with the common general knowledge of those skilled in this art. Such 
modifications include the substitution of known equivalents for any aspect of the 
invention in order to achieve the same result in substantially the same way. Accession 
numbers, as used herein, may refer to Accession numbers from multiple databases, 
including GenBank, the European Molecular Biology Laboratory (EMBL), the DNA 
Database of Japan (DDBJ), or the Genome Sequence Data Base (GSDB), for 
nucleotide sequences, and including the Protein Information Resource (PIR), 
SWISSPROT, Protein Research Foundation (PRF), and Protein Data Bank (PDB) 
(sequences from solved structures), as well as from translations from annotated 
coding regions from nucleotide sequences in GenBank, EMBL, DDBJ, or RefSeq, for 
polypeptide sequences. Accession numbers, as used herein, may also refer to 
Accession numbers from databases such as UniGene, OMIM, LocusLink, or 
HomoloGene. Numeric ranges are inclusive of the numbers defining the range. In the 
specification, the word "comprising" is used as an open-ended term, substantially 
equivalent to the phrase "including, but not limited to", and the word "comprises" has 
a corresponding meaning. Citation of references herein shall not be construed as an 
admission that such references are prior art to the present invention. All publications 
are incorporated herein by reference as if each individual publication were 
specifically and individually indicated to be incorporated by reference herein and as 
though fully set forth herein. The invention includes all embodiments and variations 
substantially as hereinbefore described and with reference to the examples and 
drawings. 
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